Hevo Airflow Provider

Last updated on Feb 20, 2026

Edge Pipeline is now available for Public Review. You can explore and evaluate its features and share your feedback.

The Hevo Airflow Provider enables you to integrate Hevo Pipelines into Directed Acyclic Graphs (DAGs) using modular components that follow standard Airflow conventions. These components include a Connection, a Hook, an Operator, and a Sensor. They rely entirely on Hevo APIs and do not change how Pipelines operate within the Hevo platform.

The following sections explain each component in detail.


Hevo Connection

The configuration component that enables the Hevo Airflow Provider to securely communicate with Hevo Pipelines. The Connection serves as a central repository for credentials and environment settings required for authentication, allowing Airflow to securely manage sensitive information and eliminating the need to store credentials directly in DAG files.

The Connection performs the following actions to manage the identity and access for the Hevo Airflow Provider:

  • Securely stores the credentials required to call Hevo APIs.

  • Defines the Base URL (Host) for the Hevo API, ensuring that the requests are routed to the correct region.

  • Acts as a single point of update for API credentials. When a key is rotated, it needs to be updated only in the connection settings, not in individual tasks.

  • Leverages Airflow’s encryption to ensure API keys are protected within the metadata database.

The Connection serves as the identity source for HevoHook. It retrieves the following details required for orchestration:

  • The Hevo Airflow Provider uses hevo_airflow_conn_id as the default connection ID to retrieve credentials if no other ID is specified in the task.

  • The API key retrieved from the Connection is used to authorize all requests sent to Hevo Pipelines.

  • The Base URL ensures that all API calls triggered by Hevo Operators or Sensors are directed to the correct environment.


Hevo Hook

The foundational building block used by all other Hevo Airflow components. It serves as the primary interface between Airflow and Hevo’s APIs, managing the communication required for orchestration.

The Hevo Hook performs the following actions to handle low-level interaction with Hevo:

  • Manages connection and authentication requests to Hevo.

  • Calls the specific Hevo APIs required to trigger and manage your Pipelines.

  • Converts raw API responses into clean Python structures for easy consumption by the Operators and Sensors.

Hevo Hook provides helper methods that allow the HevoOperator and HevoSensor to:

  • Trigger manual syncs or restart specific Pipelines.

  • Retrieve Pipeline settings and associated schemas.

  • Look up details of a specific job by ID or find the most recent active job of a given type in the Pipeline.


Hevo Operator

The custom operator provided by Hevo to orchestrate a Hevo Pipeline from within an Airflow Directed Acyclic Graph (DAG). This component does the following:

  • Interacts with the HevoHook to retrieve the connection details and call the Hevo API

  • For the specified Pipeline:

    • Validates the Pipeline state and configuration.

    • Starts a job to sync data.

The operator supports the following modes:

  • Trigger and Continue
    In this mode, the operator task triggers the sync and marks the task as complete upon receiving a successful response from the API call. You can achieve this by setting the following parameter values:

    • wait_for_completion=False

    • deferrable=False

    Example flow:

    Source ingestion → HevoOperator (trigger and continue) → HevoSensor (wait) → Warehouse

  • Synchronous Wait on Worker
    In this mode, the operator task initiates the sync and continuously checks the called API until the Pipeline job reaches a final state, such as Completed, Completed with Failures, or Failed. You can achieve this by setting the following parameter values:

    • wait_for_completion=True

    • deferrable=False

    Example flow:

    Source ingestion → HevoOperator (trigger and wait) → Warehouse

  • Trigger and Asynchronous Wait
    In this mode, the operator task initiates the sync and defers the monitoring to the triggerer. The task is resumed only after the Pipeline job finishes. You can achieve this by setting the following parameter values:

    • wait_for_completion=True

    • deferrable=True

    Example flow:

    Source ingestion → HevoOperator(wait_for_completion=True, deferrable=True) → Warehouse


Hevo Sensor

A special type of operator that monitors the status of a Pipeline job inside an Airflow Directed Acyclic Graph (DAG). It implements a polling-based strategy, acting as a synchronization point to ensure that downstream tasks are run only when the specified conditions are met.

The sensor performs the following actions to manage external orchestration:

  • Calls Hevo Pipeline Jobs APIs via HevoHook at a fixed interval.

  • Checks the status of the Pipeline job against a target list of success states:

    • If the status is in the target list, the sensor resumes the DAG.

    • If the status is not in the target list, the task is deferred. The sensor continues checking at a fixed interval or times out.

    • If the status is an error state not in the target list, the sensor fails. The DAG may be retried or stopped.

The sensor’s polling behavior is controlled by the following parameters:

  • poke_interval: The fixed interval (in seconds) at which the sensor checks the status of the Hevo Pipeline.

  • timeout: The maximum allowable time (in seconds) that the sensor can check the Pipeline status against the target list. The task is marked as failed if the Pipeline has not reached the target statuses.

The sensor polls the Pipeline status in one of the following ways:

  • Standard polling
    In this mode, the sensor occupies a worker slot for the entire time that it waits. It polls the API at a frequency defined by the poke_interval.

  • Deferrable
    This mode implements the Triggerer mechanism. The sensor initiates an asynchronous poll request, releases the worker slot, and moves the monitoring logic to the Triggerer service.

The sensor discovers the job to monitor using one of the following mechanisms:

  • Explicit Job ID:
    In this mode, the sensor monitors a specific job instance, whose details are stored in Airflow’s cross-communication (XCom) system. In the following example, the sensor monitors the job that it receives from the trigger_sync task.

    sensor = HevoSensor(
        task_id="wait_for_sync",
        pipeline_id=123,
        job_id="{{ ti.xcom_pull(task_ids='trigger_sync') }}"
    )
    
  • Auto-discovery:
    In this mode, the sensor waits for an active job of the specified job type in the Pipeline that it is monitoring. During this time, the sensor:

    • Applies initial delay (wait_for_job_initial_delay) to allow job creation.

    • Polls up to wait_for_job_max_attempts times to find the job.

    • Waits wait_for_job_interval seconds between discovery attempts.

    In the following example, the sensor monitors the most recent incremental job for Pipeline ID 123.

    sensor = HevoSensor(
        task_id="wait_for_sync",
        pipeline_id=123,
        job_type=JobType.INCREMENTAL  # No job_id provided
    )
    

    This discovery mode is useful for monitoring externally triggered jobs or when the job_id is unavailable.


Hevo Trigger

An asynchronous monitoring unit that powers the Hevo Sensor in deferrable mode. The trigger runs in Airflow’s triggerer service as a separate process. It moves the monitoring logic from the Airflow worker to the triggerer service, releasing the worker.

The trigger takes over the orchestration when a sensor is deferred. It performs the following actions:

  • Calls the Hevo Pipeline Job APIs through the triggerer service, so the Airflow worker is free while the job is being checked.

  • Continuously monitors the Pipeline job’s status at a frequency defined by the poke_interval until the job reaches a final state or the timeout defined by Airflow is exceeded.

Once the job being monitored by the trigger reaches its final state, control is returned to the Airflow worker via a TriggerEvent. This event is initiated under the following conditions:

  • Successful Completion: The job reaches a state defined in the target success list.

  • Partial Success: The job completes with failures, but accept_completed_with_failures is enabled.

  • Job Failure: The job reaches a terminal state that is not in the target success list.

  • System Error: An unrecoverable issue occurs, such as an unreachable API or network failure.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Feb-20-2026 NA New document.

Tell us what went wrong