Integrating Hevo with Airflow

Last updated on Feb 20, 2026

Edge Pipeline is now available for Public Review. You can explore and evaluate its features and share your feedback.

Hevo integrates with Apache Airflow through the Hevo Airflow Provider. This integration uses dedicated tasks to start Pipeline runs and monitor jobs using Hevo’s external APIs.

These tasks are organized as Directed Acyclic Graphs (DAGs), which serve as visual maps defining the specific order and timing of each step in your data flow. By incorporating these tasks into a DAG, you can seamlessly align data ingestion in your Pipelines with your broader automated workflows.

How the Integration Works

Hevo Airflow Provider and Airflow work together as follows:

  • A Hevo Pipeline moves and transforms data from a Source to the Destination.

  • External APIs are used to trigger Pipeline runs and retrieve details of past and active jobs in the Pipelines.

  • Airflow DAGs orchestrate the flow by:

    • Using Hevo-specific tasks, such as operators and sensors, to trigger the Pipeline jobs and wait for their completion.

    • Chaining Hevo tasks with other processes, such as dbt transformations.

Airflow handles scheduling and monitoring, while Hevo manages the data flow. You can design Edge Pipelines so that Airflow can effectively orchestrate them. Read Hevo Airflow Provider for more information on its components.


Designing Pipelines for External Orchestration

You can design Pipelines for external orchestration based on whether you prefer Hevo or Airflow to control when the Pipeline runs. Refer to the following sections for guidelines on designing these Pipelines.

Pipelines with fixed schedules

Create a Pipeline that runs on a fixed schedule when you want:

  • Hevo to determine the data replication frequency, such as every 30 minutes or every 2 hours.

  • The orchestrator to wait for the latest run to complete before starting downstream tasks.

Designing the Pipeline

  1. Configure the Edge Pipeline with the sync frequency set to Scheduled Run.

  2. In Airflow, create a DAG using the Hevo Sensor to:

  3. Once the sensor succeeds, run your downstream tasks, such as dbt transformations on the Destination data.

This design is ideal when:

  • Pipelines are already created using Hevo’s native scheduling functionality.

  • The orchestrator does not need to modify the Pipeline’s sync frequency or trigger Pipeline runs.

  • Downstream tasks need to run only when new data is available at the Destination.

Pipelines with No Schedules

Create a Pipeline with no schedule when you want the orchestrator to:

  • Trigger the Pipeline run using Hevo APIs.

  • Poll the Pipeline job status until it completes.

  • Decide when to retry a run or start the next run.

Designing the No-Schedule Pipeline

  1. Configure the Edge Pipeline with the sync frequency set to Sync On Demand. The Pipeline does not run automatically; it runs only when triggered manually or programmatically.

  2. In Airflow:

This design is ideal when:

  • The Pipeline must run only after specific upstream events, such as the completion of a task in another system.

  • The workflow manager must own all schedules, dependencies, retries, and SLAs across various systems.

  • The Pipelines represent one stage in a larger multi-layer data flow.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Feb-20-2026 NA New document.

Tell us what went wrong