Using Edge Pipelines with Airflow

Last updated on Feb 20, 2026

Edge Pipeline is now available for Public Review. You can explore and evaluate its features and share your feedback.

Apache Airflow is a workflow management system that lets you define batch workflows as code. It provides robust scheduling, dependency management, retries, and monitoring, making it a natural fit for orchestrating Edge Pipelines. The Hevo Airflow Provider enables triggering and monitoring Hevo Pipeline syncs from Airflow Directed Acyclic Graphs (DAGs) with deferrable execution support. You can coordinate data syncs from your Source to any supported Destination, making the data available to downstream jobs, such as dbt models or BI tools, in a single, end-to-end workflow.


Prerequisites

  • An active Hevo account. You can start with a 14-day full-feature, free trial account.

  • A working Apache Airflow environment with version 2.4.0 or above.

    Note: For production environments, the minimum recommended version is 2.4.0.

  • Python version 3.9 or above is installed.

  • A GitHub account to access the Hevo Airflow Provider repository.

  • A user with the Administrator role in Hevo for generating API credentials in your account region.


Perform the following steps to integrate Edge Pipelines with Apache Airflow:

  1. Obtain credentials to access Hevo APIs.

  2. Configure an Edge Pipeline to sync data on demand.

    Note: You can skip this step if you want to run a DAG from the examples in Hevo Airflow Provider repository.

  3. Install the Hevo Airflow Provider in your Airflow environment.

  4. Create an Airflow connection to Hevo.

  5. Test the Hevo Airflow Provider setup:

    1. Create an Airflow DAG to trigger data syncs for your Edge Pipeline.

    2. Refresh the Airflow UI.

    3. Trigger and Monitor the DAG.


Obtain Hevo API Credentials

To trigger Pipeline syncs using Hevo APIs, you need to generate API credentials. Perform the following steps to do this:

  1. Log in to your Hevo account and click the link for your account region to access the API Keys page:

  2. Click + New API Key to generate the API key for your account region.

    New API Key

    Note: The credentials are valid for all APIs in that region and need to be generated only once.

  3. In the API Key window, click the copy icon in the Access Key and Secret Key fields and save them securely like any other password. Optionally, click Download to save the API credentials on your local machine.

    Click Download

    Note: The Secret Key is not displayed once you close the API Key window.

Provide these API credentials when creating an Airflow connection to Hevo.


Create a Sync On Demand Pipeline

You can use the Hevo Airflow Provider to trigger data syncs for an Edge Pipeline. For this, the Pipeline must be created to sync data on demand.

Note: Skip to the Install the Hevo Airflow Provider section if you want to run a DAG from the examples in Hevo Airflow Provider repository.

Perform the following steps to create the Pipeline:

  1. Log in to your Hevo account. By default, Pipelines is selected in the Navigation Bar.

  2. On the Pipelines page, click + Create Pipeline, and then do the following:

    1. Set up the Source that you want to use for data ingestion.

    2. Configure the Destination where you want to replicate your Source data.

    3. Select Sync On Demand for the sync frequency when providing the Pipeline-specific settings.

    4. Select the Source objects from which you want to ingest data.

  3. Click Save & Continue. To proceed to the Review page, you must select at least one object for data ingestion.

  4. In the Summary screen, review all the configurations defined for the Pipeline.

  5. Click Save Pipeline to create the Pipeline.

  6. Verify that the Pipeline is Enabled and Not Scheduled, and note the Pipeline ID. You need to provide this ID to the DAG that you want to run.

    Pipeline ID


Install the Hevo Airflow Provider

You can install the Hevo Airflow Provider from its GitHub repository. Perform the following steps to do this:

  1. Open a Terminal window and navigate to the location where you want to clone the provider repository.

  2. Run the following command to clone the repository:

    git clone https://github.com/hevoio/hevo-airflow-provider.git
    
  3. Run the following commands to install the provider in your Airflow installation:

    cd hevo-airflow-provider
    pip install .
    

Create an Airflow Connection to Hevo

Create an Airflow connection to Hevo so your DAGs can trigger on-demand syncs for your Pipelines.

Perform the following steps to do this:

  1. From a web browser, access the Airflow UI at http://localhost:8080 and log in as an admin user.

  2. On the Airflow page, click Admin in the top navigation bar and select Connections from the drop-down.

    Select Connections

  3. On the List Connection page, click the + icon to add a new record.

    List Connection

  4. On the Add Connection page, specify the following:

    Add Connection

    • Connection Id: A unique name to reference the connection in your DAGs. Enter hevo_airflow_conn_id. This is the default value for Hevo Airflow Provider.

    • Connection Type: The type of the external system. Select HTTP from the drop-down. This allows Airflow to manage the base URL and credentials.

    • Description: A suitable text describing the connection.

    • Host: The hostname or IP address of the external server. Provide the region of your Hevo account. For example, asia.hevodata.com.

    • Schema: The internal database for data systems or the communication protocol for web services. As the Connection Type is HTTP, set this to https.

    • Login: The username for connecting to the external system. Provide the access key from the generated API Credentials.

    • Password: The password for the specified Login. Provide the secret key from the generated API credentials.

    • (Optional) Extra: A JSON-formatted field to store non-standard, provider-specific configurations, such as custom HTTP headers. For example, {“headers”: {“X-Custom-Header”: “value”}}.

  5. Click Save.

On the List Connection page, verify that the connection has been created, and specify it in your DAG.


Test the Hevo Airflow Provider Setup

You can test the setup using one of the sample DAGs from the dag_examples directory of the Hevo Airflow Provider or by adding a DAG to sync data from the Pipeline created in the Create a Sync On Demand Pipeline section. Here, we are creating a DAG.

Perform the following steps to test the setup:

1. Create the DAG file

  1. Open a terminal window and navigate to your Airflow home directory.

  2. If the dags directory does not exist, run the following command to create it:

    mkdir -p dags
    
  3. Navigate to the dags directory:

    cd dags
    
  4. Run the following command to create a DAG file.

    touch test_hevo_sync.py
    

    Note: You can replace the sample value, test_hevo_sync, in the command above with your own DAG name.

  5. Open the file in a text editor and paste the following code:

    from airflow import DAG
    from datetime import datetime
    from airflow.hevo.operators import HevoPipelineOperator
    from airflow.operators.bash import BashOperator
        
    with DAG(
        dag_id="hevo_test_sync", 
        start_date=datetime(2026, 1, 1), 
        schedule_interval=None, # Set to None for manual or external triggering
        catchup=False
    ) as dag:
        
        # 1. Trigger and Monitor the Hevo Pipeline
        sync_and_monitor = HevoPipelineOperator(
            task_id="run_hevo_pipeline",
            pipeline_id=<your Pipeline ID>,         # Pipeline ID from Step 2
            connection_id="hevo_airflow_conn_id",   # Must match the Conn ID in UI
        
            # MONITORING: Setting wait_for_completion to True enables status polling
            wait_for_completion=True,
        
            # EFFICIENCY: Deferrable mode uses the Hevo Triggerer to free up worker slots
            deferrable=True
        )
        
        # 2. Downstream task: Runs only after the Pipeline job completes successfully
        post_sync_transform = BashOperator(
            task_id="run_dbt_transformations",
            bash_command="dbt run"
        )
        
        # Ensures dbt runs only after fresh data is available
        sync_and_monitor >> post_sync_transform
    
  6. Save the file and exit the editor. Airflow automatically detects new DAG files placed in the dags directory.

Refer to the Example section to understand what the DAG does.

2. Refresh the Airflow UI

  1. From a web browser, access the Airflow UI at http://localhost:8080 and log in as an admin user.

  2. On the Airflow page, click DAGs in the top navigation bar.

    Click DAGs

  3. If the scheduler has successfully parsed the file added above, the DAG hevo_test_sync should appear in the list. If it is not displayed, wait for a minute and refresh the browser.

    DAG List

    Note:

    • Airflow’s scheduler automatically scans the dags directory in your Airflow installation every 30-60 seconds.

    • You must resolve the parsing errors, if any, before proceeding further.

  4. By default, the DAG is paused. Unpause the DAG by clicking the toggle switch on the left.

    Unpause DAG

3. Trigger and Monitor the DAG

  1. On the DAGs page, click the DAG ID to trigger. For example, hevo_test_sync. Ensure the DAG is active.

    Click DAG ID

  2. On the DAG: <Your DAG ID> page, click the Play button and select Trigger DAG from the drop-down.

    Trigger DAG

  3. Click the task square to open the details panel and select Log to monitor the response from Hevo APIs.

    Select Log

  4. Check the Job history in your Pipeline to confirm that the data sync was triggered.

    Job History

How the Example works

The DAG file in the Create the DAG file section demonstrates how to orchestrate a Sync On Demand Pipeline. It triggers the sync and waits for it to finish before running a downstream transformation.

The following explains how the DAG configuration manages an end-to-end data flow:

  • Integrated Orchestration: The HevoPipelineOperator is the single point of control that triggers the job and retrieves the details of the active run.

  • Blocking for Monitoring: The wait_for_completion=True ensures that the task remains in an active state in Airflow until the Pipeline job reaches the Completed or Completed with Failures state.

  • State Polling: While the task is active, it uses the Pipeline Job APIs to poll for the current state of the job.

  • Resource Optimization: Setting deferrable=True delegates polling to the HevoTriggerer service, releasing Airflow worker slots while waiting for long-running data ingestion processes.

  • Dependency Alignment: This canonical design ensures that downstream tasks only run once fresh data is confirmed to be available at the Destination.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Feb-20-2026 NA New document.

Tell us what went wrong