Airflow Concepts

Last updated on Feb 20, 2026

Edge Pipeline is now available for Public Review. You can explore and evaluate its features and share your feedback.

Apache Airflow® is an open-source platform for creating, scheduling, and monitoring workflows using Python. These workflows are designed for scalability, reliability, and extensibility. Airflow allows you to build complex workflows using its Python framework.


Core Framework Components

The core framework defines the workflow logic and rules for managing resources. It includes the following components:

  • Directed Acyclic Graphs (DAG): The structural framework of a workflow. It manages processes and groups related tasks into a single logic flow. By establishing relationships between tasks, a DAG orchestrates the workflow and defines a non-circular, acyclic sequence for running them.

  • Worker Slot: A process assigned to each task while it runs. If all available slots are occupied, new tasks are queued until a slot is released.

  • Task: A fundamental unit of work in Airflow. Tasks form the individual steps in a DAG process. The basic types of tasks in Airflow are:

    • Operators: An Operator is a template that defines the actual work to be performed. It is a predefined class that determines how a specific job is done. For example, HevoOperator is a custom operator that triggers the Hevo Pipeline.

    • Sensors: A special type of Operator that waits for an event to occur. Sensors are long-running tasks that pause a DAG until an external condition is met. For example, HevoSensor is a custom operator that waits for a Hevo Pipeline job to reach the Completed status.

Choosing Sensor Modes

As sensors primarily wait for external events to occur, they can be configured to run in one of the following modes:

  • Poke: In this mode, the sensor retains the worker slot for the entire wait duration, blocking other tasks from using that process until the specified event occurs. Use this mode when the wait time is expected to be short. For example, when the wait time is less than a minute.

  • Reschedule: In this mode, the sensor holds the worker slot only while actively checking for the specified condition. If the condition is not met, it releases the slot and pauses until the next check. Use this mode when the wait time is extended or unpredictable. For example, when the wait time ranges from several minutes to hours.


Understanding the Architecture

Airflow uses a distributed, modular architecture to manage task execution. The following components form the engine that handles task scheduling and orchestration:

  • Webserver: The primary dashboard for visualizing the health of your Airflow workflows and managing DAG runs.

  • Scheduler: The component that determines when each task should run based on its dependencies and timing.

  • Triggerer: A lightweight service that handles deferrable tasks to save resources.

  • Worker: The service that runs the task assigned by the scheduler.

  • Metadata Database: The central repository that stores every change in state for your tasks and connections.


Infrastructure and Connectivity

The infrastructure and connectivity layer enables Airflow to communicate securely with external systems. This layer includes the following components:

  • Connections: An Airflow connection is a set of parameters that includes details such as hostname, port, credentials, the type of system, and a unique name. Airflow stores this information in the metadata database. For example, hevo_airflow_conn_id represents the Airflow HTTP connection for Hevo.

  • Hooks: A high-level interface used to communicate with external platforms, such as databases or APIs. Hooks use the details retrieved from a Connection to manage authentication and communication. For example, HevoHook uses Hevo Pipelines and Pipeline Jobs APIs.

  • Triggerers: A centralized service that manages deferred tasks by monitoring external systems asynchronously. Instead of tying up a worker slot, a deferrable task delegates monitoring to the Triggerer service. This service signals the deferred task to resume when the condition is met. For example, HevoSensor uses an asynchronous trigger to poll Hevo APIs in the background.


Example

The following is an example workflow that sends a slack notification to the analytics team. The team will update the sales dashboard only after new data becomes available.

DAG flow

trigger_hevo_pipeline >> wait_for_completion >> notify_analytics_team_on_slack

Sequence of Events

  1. trigger_hevo_pipeline: This operator task triggers the Hevo Pipeline by invoking the required API using the configured connection and hook.

  2. wait_for_completion: This sensor monitors the status of the Pipeline job. If the sensor runs in the Reschedule or Deferrable mode, it releases the worker slot while waiting for the job to complete.

  3. notify_analytics_team_on_slack: Once the Pipeline job completes, the operator task sends a slack notification to the analytics team.

Once the notification is received, the analytics team can update their sales dashboard with the new data.


Revision History

Refer to the following table for the list of key updates made to this page:

Date Release Description of Change
Feb-20-2026 NA New document.

Tell us what went wrong