Airflow v1 vs v2

Apache Airflow has become the de facto standard for orchestrating data workflows, enabling teams to author, schedule, and monitor complex pipelines with ease.

Originally developed at Airbnb, it has grown into a mature open-source project used by organizations across the globe.

With the release of Airflow v2, the project underwent a significant transformation.

While Airflow v1 laid the foundation, v2 introduced critical enhancements aimed at addressing long-standing issues around scalability, stability, and developer experience.

The shift from v1 to v2 wasn’t just incremental—it redefined how teams build and manage workflows.

If you’re still on Airflow v1 or are evaluating a migration, understanding the key differences between the two versions is essential.

This guide breaks down the architectural, functional, and operational improvements introduced in v2, helping you make an informed decision.

Whether you’re managing ETL pipelines, ML workflows, or infrastructure automation—as discussed in our Airflow vs Terraform and Airflow vs Cron comparisons—knowing what version of Airflow you’re using can have a profound impact on performance and maintainability.

For broader context on Airflow’s ecosystem, check out our related comparison: Airflow vs Rundeck.


High-Level Summary of Changes

Apache Airflow v2 introduced a suite of enhancements that addressed critical pain points in the v1.x series.

While Airflow v1.x laid the groundwork for workflow orchestration, it struggled with scalability, operational complexity, and limited extensibility.

Airflow v2.x significantly improves the architecture and user experience with new core components, better task execution, and a more robust API.

Here’s a high-level comparison of the major differences between the two versions:

Feature / AreaAirflow v1.xAirflow v2.x
SchedulerSingle-threaded, limited scalabilityMulti-scheduler support with better scaling
DAG ParsingSerialized via picklingDAG Serialization in JSON
Task ExecutionLimited parallelismEnhanced parallelism via smart sensors and task groups
APIExperimental, limited endpointsFull REST API with RBAC support
Task Dependency Syntaxset_upstream(), set_downstream()Native >> and << operators
CLI & UIOlder UI, inconsistent CLIModern UI, unified CLI
Security & Access ControlBasic, plugin-dependentFull RBAC with role support
Scheduler ResilienceProne to missed runs or bottlenecksHA-ready, decoupled scheduling
PluginsPlugin loading inconsistenciesStable, namespaced plugins
Community AdoptionLegacy, limited supportActive development and community best practices

Airflow 2.x isn’t just an upgrade—it’s a re-architecture.

The improvements make it far more production-ready for data teams, SREs, and platform engineers running complex pipelines at scale.


TaskFlow API in v2

One of the most transformative features introduced in Airflow 2.x is the TaskFlow API, which allows developers to build DAGs using Python functions rather than relying solely on traditional Operators.

This new, functional approach improves readability, testing, and maintainability of Airflow pipelines.

What Is the TaskFlow API?

The TaskFlow API brings native Python function support to Airflow tasks.

It uses the @task decorator to convert regular Python functions into Airflow tasks automatically—handling serialization, logging, and XCom (cross-communication) under the hood. This leads to cleaner DAGs and less boilerplate code.

Task Definition: v1 vs v2

Let’s compare the task definition process in Airflow v1 and v2 using a simple ETL example.

Airflow v1.x (Traditional Operators)

python
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from datetime import datetime
def extract():
return {“data”: “some_value”}def transform(**context):
data = context[‘ti’].xcom_pull(task_ids=‘extract’)
return data[‘data’].upper()

with DAG(‘v1_example’, start_date=datetime(2023, 1, 1), schedule_interval=‘@daily’) as dag:
extract_task = PythonOperator(task_id=‘extract’, python_callable=extract)
transform_task = PythonOperator(task_id=‘transform’, python_callable=transform, provide_context=True)

extract_task >> transform_task

Airflow v2.x (TaskFlow API)

python
from airflow.decorators import dag, task
from datetime import datetime
@dag(start_date=datetime(2023, 1, 1), schedule_interval=‘@daily’, catchup=False)
def v2_example(): @task
def extract():
return {“data”: “some_value”} @task
def transform(data):
return data[‘data’].upper()

extracted = extract()
transform(extracted)

v2_example()

Benefits of TaskFlow API

  • Simplified Syntax: Reduces boilerplate and dependency on provide_context or XCom calls.

  • Improved Testability: Functions are native Python, easier to test outside Airflow context.

  • Better Modularity: Encourages modular and reusable task design.

The TaskFlow API is a game-changer for data engineers and Python developers who want to create clean, readable, and maintainable DAGs.


Scheduler and Executor Enhancements

One of the key limitations in Airflow 1.x was its single-scheduler architecture, which created a potential single point of failure.

This posed scalability and high availability challenges in production environments.

Airflow 2.x addressed this limitation with a re-architected scheduling system designed for resilience and performance.

Limitations in Airflow v1

In Airflow v1.x:

  • Only one scheduler could run at a time.

  • If that scheduler failed, no new tasks would be scheduled.

  • There was no built-in mechanism for horizontal scaling of the scheduler itself.

  • Executors like CeleryExecutor and KubernetesExecutor had limited observability and performance tuning options.

This meant that large-scale deployments often required custom workarounds or external monitoring to ensure reliability.

Improvements in Airflow v2

Airflow 2.x introduced a multi-scheduler architecture with native support for High Availability (HA).

Now, you can run multiple schedulers in parallel, and they coordinate safely using database-level locks, ensuring that no DAG or task gets scheduled more than once.

Key Enhancements:

  • Multi-Scheduler Support

    • Multiple schedulers can be deployed concurrently.

    • Eliminates the single point of failure from v1.x.

    • Greatly improves scalability for environments with many DAGs.

  • Executor Upgrades

    • CeleryExecutor improvements:

      • More efficient task queuing and worker communication.

      • Better integration with observability tools.

    • KubernetesExecutor enhancements:

      • More stable pod launching behavior.

      • Improved resource handling for ephemeral tasks.

      • Reduced scheduler-pod communication overhead.

  • Faster Scheduling Loop

    • The new scheduling loop is faster and more efficient, enabling better throughput for large DAGs.

Real-World Impact

For teams running hundreds or thousands of DAGs, these enhancements are crucial.

The multi-scheduler feature ensures resilience, and improved executors make distributed execution more efficient, especially in cloud-native or Kubernetes-based environments.


New REST API

One of the most anticipated improvements in Airflow 2.x is the introduction of a stable, production-ready REST API.

In contrast to Airflow 1.x, which only offered an experimental API with limited functionality and no guarantees of stability, Airflow 2.x provides a robust interface for interacting with your workflows programmatically.

Limitations in v1

Airflow 1.x included an experimental API that:

  • Lacked comprehensive documentation.

  • Was prone to breaking changes between versions.

  • Supported only a limited set of operations (e.g., triggering DAGs).

  • Offered no authentication or authorization mechanisms out of the box.

This made it difficult for teams to integrate Airflow cleanly into CI/CD pipelines or automation systems.

Improvements in v2

Airflow 2.x introduces a stable, OpenAPI-compliant REST API that is:

  • Well-documented and versioned.

  • Secure, supporting authentication (via JWT, basic auth, etc.) and role-based access control.

  • Extensible, enabling teams to build custom tooling and integrations.

With the new API, you can:

  • Trigger DAG runs and tasks.

  • Monitor DAG execution status.

  • Manage variables, connections, pools, and other metadata programmatically.

  • Automate workflow deployment, testing, and monitoring in CI/CD pipelines.

Example Use Cases

  • DevOps teams can trigger DAGs from CI/CD pipelines (e.g., GitHub Actions, Jenkins).

  • Data engineers can integrate Airflow with data cataloging or quality tools.

  • Platform teams can automate environment bootstrapping and monitoring.

You can explore the full API spec via the /api/v1/ endpoint or by visiting the official Swagger UI interface.


Deferrable Sensors and Smart Triggering

One of the most impactful changes in Apache Airflow 2.x is the introduction of Deferrable Operators and the Triggerer—a major step toward improving resource efficiency for long-running tasks.

The Problem with Sensors in v1

In Airflow 1.x, Sensors (e.g., TimeSensor, ExternalTaskSensor, S3KeySensor) were blocking tasks.

This means they occupied a worker slot for the entire duration of their wait, often leading to:

  • Wasted resources, especially when many sensors were active.

  • Scheduler bottlenecks when the system had to manage thousands of active but idle tasks.

  • Increased cost and complexity in distributed environments like Kubernetes or Celery.

The Solution in v2: Deferrable Operators

Airflow 2.x solves this inefficiency with Deferrable Operators.

These operators “defer” their execution while waiting and hand off control to a lightweight process called the Triggerer.

This allows the task to:

  • Free up the worker slot during wait time.

  • Be reactivated only when the condition is met.

  • Scale to handle thousands of idle wait conditions without overwhelming the system.

This is especially useful in cloud-native environments, where blocking costs money.

Key Component: The Triggerer

The Triggerer is a new daemon process introduced in Airflow 2.2+ that handles deferred tasks asynchronously.

It efficiently manages large numbers of sleeping sensors without using workers, improving overall scalability and performance.

Example: TimeSensor vs TimeSensorAsync

Airflow v1 – Blocking TimeSensor:

python

from airflow.sensors.time_sensor import TimeSensor

wait_until = TimeSensor(task_id=‘wait_until_6am’, target_time=’06:00:00′)

Airflow v2 – Deferrable TimeSensorAsync:

python

from airflow.sensors.time_sensor import TimeSensorAsync

wait_until = TimeSensorAsync(task_id=‘wait_until_6am’, target_time=’06:00:00′)

The TimeSensorAsync releases the worker after deferring, and the Triggerer reactivates it when it’s time to resume.

Bottom Line

  • Use deferrable operators for sensors that may wait minutes or hours.

  • Greatly improves resource efficiency and task scalability.

  • A must-have for any production-grade Airflow deployment handling large DAG volumes or event-based scheduling.


Be First to Comment

    Leave a Reply

    Your email address will not be published. Required fields are marked *