Airflow Deployment on Kubernetes

Airflow deployment on Kubernetes can be a great choice for complex workflows.

Apache Airflow is a powerful workflow automation tool used for scheduling and monitoring data pipelines.

When deployed on Kubernetes, Airflow benefits from scalability, resource efficiency, and workload isolation, making it an ideal solution for managing complex workflows in cloud-native environments.

Why Deploy Airflow on Kubernetes?

Traditional Airflow deployments can face challenges such as resource limitations, dependency conflicts, and manual scaling efforts.

By leveraging Kubernetes, teams can:

✅ Scale dynamically – Automatically adjust resources based on DAG workloads.

✅ Optimize resource usage – Efficiently allocate CPU and memory across tasks.

✅ Ensure isolation – Run tasks in separate, containerized environments for better stability.

Deployment Methods: Manual Setup vs. Helm

There are two primary ways to deploy Airflow on Kubernetes:

1️⃣ Manual Setup – Involves configuring Kubernetes manifests, creating Pods, Deployments, Services, and setting up persistent storage.

2️⃣ Helm Chart Deployment – Uses the official Apache Airflow Helm chart, simplifying the deployment process with pre-configured templates.

In this guide, we’ll walk through both deployment methods, highlighting their advantages and helping you choose the best approach for your needs.

💡 Further Reading:

Learn more about Apache Airflow and its architecture.

Prerequisites for Deploying Airflow on Kubernetes

Before deploying Apache Airflow on Kubernetes, ensure you have the following prerequisites set up:

1. Kubernetes Cluster Setup

You need a Kubernetes cluster to run Airflow. Depending on your environment, you can choose from:

Local Deployment: Minikube (for testing and development)
Cloud Providers:
- AWS: Amazon EKS
- Google Cloud: Google Kubernetes Engine (GKE)
- Azure: Azure Kubernetes Service (AKS)

2. Install kubectl and Helm

kubectl: Command-line tool for interacting with Kubernetes. Install it from the official documentation.
Helm: A package manager for Kubernetes that simplifies deployment. Install it from the Helm website.

3. Airflow Requirements and Resource Planning

Before deploying, consider the following:

✅ Storage & Persistence: Use PersistentVolumes for storing logs and metadata.

✅ Database: Airflow requires a PostgreSQL or MySQL database for metadata storage.

✅ Worker Resources: Plan CPU and memory allocations based on DAG complexity.

With these prerequisites in place, you’re ready to deploy Airflow on Kubernetes.

Next, we’ll explore how to set up Airflow using Helm charts.

Deploying Airflow on Kubernetes with Helm

One of the easiest and most efficient ways to deploy Apache Airflow on Kubernetes is by using Helm, a package manager for Kubernetes.

The official Airflow Helm chart simplifies the deployment process by managing all the necessary Kubernetes resources, including the scheduler, webserver, workers, and database.

1. Introduction to the Official Airflow Helm Chart

The Apache Airflow Helm chart is maintained by the Airflow community and provides a standardized way to deploy Airflow on Kubernetes.

It offers built-in configurations for:

✅ PostgreSQL or external databases

✅ CeleryExecutor, KubernetesExecutor, or LocalExecutor

✅ Auto-scaling workers

✅ Airflow webserver, scheduler, and workers as Kubernetes pods

You can find the official Helm chart here: Apache Airflow Helm Chart

2. Installing Airflow Using Helm

Once your Kubernetes cluster is ready and Helm is installed, follow these steps to deploy Airflow:

Step 1: Add the Apache Airflow Helm Repository

This command adds the official Apache Airflow Helm repository and updates it to fetch the latest chart versions.

Step 2: Install Airflow with Default Configuration

This command installs Airflow in a new namespace called airflow using the default settings.

3. Configuring values.yaml for Custom Deployments

To customize your Airflow deployment, you need to modify the values.yaml file before installation.

Some key configurations include:

⚙️ Setting Up Executor Type

yaml

Use KubernetesExecutor for a fully containerized setup or CeleryExecutor for distributed workers.

⚙️ Enabling Persistent Storage for Logs

yaml

This ensures that Airflow logs persist even after pods restart.

⚙️ Configuring Database Backend

yaml

You can also connect to an external database instead of using the built-in PostgreSQL.

Deploying with Custom Configuration

Once values.yaml is updated, install Airflow with:

Next Steps

After installation, you can access the Airflow UI, configure DAG storage, and fine-tune resource settings.

In the next section, we’ll discuss how to manage DAG deployments efficiently within your Kubernetes-based Airflow setup.

Understanding Airflow Components in Kubernetes

When deploying Apache Airflow on Kubernetes, understanding how its components interact is crucial.

Kubernetes runs each component as a separate pod, ensuring scalability, isolation, and fault tolerance.

Below is an overview of the key Airflow components in a Kubernetes deployment.

1. Web Server Deployment and Service Configuration

The Airflow web server provides the UI for monitoring DAGs, managing configurations, and checking logs.

It typically runs as a Kubernetes Deployment and is exposed via a Kubernetes Service.

Deployment Example (webserver.yaml)

yaml

Service Configuration

To expose the webserver outside the cluster, we define a Kubernetes Service:

This makes the Airflow UI accessible through an external IP.

2. Scheduler and Worker Pods

Scheduler

The scheduler is responsible for monitoring and triggering DAGs. In a Kubernetes setup, it runs as a Deployment and communicates with the database to track task statuses.

Workers

Workers execute tasks in DAGs. The execution model depends on the chosen executor:

CeleryExecutor: Uses distributed worker pods.
KubernetesExecutor: Dynamically creates worker pods for each task.

For KubernetesExecutor, worker pods are created on-demand, ensuring efficient resource utilization.

3. Triggerer and DAG Execution Flow

With Airflow 2.x, the triggerer component was introduced to handle asynchronous tasks efficiently.

How DAG Execution Works in Kubernetes:

The scheduler picks up a scheduled DAG.
Based on the executor, a worker pod is created (for KubernetesExecutor) or a Celery worker picks up the task.
The task runs inside the worker pod, accessing resources like databases and storage.
Upon completion, logs and results are stored in the database and persistent storage.

4. Database Setup with Kubernetes Persistent Volumes

Airflow requires a relational database (PostgreSQL or MySQL) to store metadata, DAG runs, and task states.

In Kubernetes, we can deploy the database as a StatefulSet or use a managed service like AWS RDS, GCP Cloud SQL, or Azure Database for PostgreSQL.

PostgreSQL Deployment Example (postgres.yaml)

yaml

apiVersion: apps/v1
 kind: StatefulSet
 metadata:
 name: postgres
 namespace: airflow
 spec:
 serviceName: postgres
 replicas: 1
 selector:
 matchLabels:
 app: postgres
 template:
 metadata:
 labels:
 app: postgres
 spec:
 containers:
 - name: postgres
 image: postgres:13
 env:
 - name: POSTGRES_USER
 value: airflow
 - name: POSTGRES_PASSWORD
 value: airflowpassword
 - name: POSTGRES_DB
 value: airflow
 volumeMounts:
 - mountPath: /var/lib/postgresql/data
 name: postgres-storage
 volumeClaimTemplates:
 - metadata:
 name: postgres-storage
 spec:
 accessModes: [ "ReadWriteOnce" ]
 resources:
 requests:
 storage: 10Gi

This configuration ensures the database persists even if the pod restarts.

Next Steps

Now that we’ve covered the core Airflow components in Kubernetes, the next section will focus on DAG storage and execution, including how to use Kubernetes Persistent Volumes, ConfigMaps, and Git sync to manage DAG files efficiently.

Managing DAGs in a Kubernetes Deployment

Effectively managing DAGs in Apache Airflow on Kubernetes is crucial for ensuring reliability, version control, and automation.

Since DAGs define workflows, they must be kept up to date and consistent across environments.

This section explores best practices for storing, syncing, and updating DAGs in a Kubernetes-based Airflow deployment.

1. Storing DAGs in a GitHub Repository and Syncing with Kubernetes

A best practice for Airflow DAG management is to store DAG files in a GitHub repository.

This provides:

✅ Version control – Track changes to DAGs and revert if necessary.

✅ Collaboration – Multiple team members can contribute to DAG development.

✅ Automation – Use CI/CD pipelines to deploy DAG updates.

Recommended Repository Structure

bash

With this structure, DAG files are stored in GitHub, and Kubernetes synchronizes them automatically.

2. Using Git-Sync or Kubernetes Persistent Volumes for DAG Storage

Airflow DAGs need to be available to all scheduler and worker pods. There are two common approaches:

Option 1: Using Git-Sync to Auto-Update DAGs from GitHub

Git-Sync is a lightweight tool that automatically pulls the latest changes from a Git repository.

This ensures that Airflow DAGs remain up to date without requiring a full redeployment.

Example Deployment with Git-Sync

✔️ How it Works:

The Git-Sync container pulls DAGs from GitHub every 30 seconds.
The DAGs are mounted as a shared volume, making them accessible to scheduler and worker pods.

Option 2: Using Kubernetes Persistent Volumes for DAG Storage

Another option is to use Persistent Volumes (PVs) to store DAGs. This approach is useful if:

You want DAGs to persist across pod restarts.
You’re using a cloud storage-backed Persistent Volume (e.g., AWS EFS, GCP Filestore, Azure Files).

Example: DAG Storage with a Persistent Volume in Kubernetes

✔️ How it Works:

Persistent Volumes (PVs) store DAG files.
All Airflow components (Scheduler, Workers, Webserver) mount the same DAG volume.

3. Automating DAG Updates with CI/CD

To ensure DAG updates are automatically deployed when changes are pushed to GitHub, we can use GitHub Actions for CI/CD.

Example: GitHub Actions Workflow for DAG Deployment

yaml

✔️ How it Works:

Triggers when DAG files change (dags/**).
Automatically updates DAGs in Kubernetes.

Next Steps

Now that we have covered DAG management strategies, the next section will focus on scaling Airflow on Kubernetes, including setting up Horizontal Pod Autoscaling (HPA) and resource requests/limits for optimizing performance.

Scaling Airflow on Kubernetes

Scaling Apache Airflow on Kubernetes ensures that workflow execution remains efficient, even as DAG complexity and task volume increase.

Kubernetes provides built-in autoscaling capabilities that allow Airflow to dynamically adjust resources based on demand.

This section covers:

✅ Configuring worker autoscaling with Kubernetes Horizontal Pod Autoscaler (HPA)

✅ Optimizing resource allocation for efficient task execution

✅ Best practices for handling large-scale workflows

1. Configuring Worker Autoscaling with Kubernetes Horizontal Pod Autoscaler (HPA)

Airflow workers are responsible for executing DAG tasks. When workloads spike, we need more workers; when workloads are light, we should scale down to save resources.

Kubernetes Horizontal Pod Autoscaler (HPA) automatically adjusts the number of worker pods based on CPU or memory usage.

Step 1: Define Resource Requests and Limits for Workers

Before enabling autoscaling, set CPU and memory requests in the worker deployment.

✔️ How it Works:

requests: The guaranteed minimum resources for a worker pod.
limits: The maximum resources a pod can use.

Step 2: Enable Kubernetes HPA for Airflow Workers

Create an HPA policy to scale workers dynamically.

✔️ How it Works:

Scales workers between 2 and 10 replicas based on CPU usage.
Threshold set to 70% CPU utilization—if usage exceeds this, Kubernetes adds more workers.

To apply the HPA policy, run:

2. Optimizing Resource Allocation for Task Execution

To improve performance, it’s essential to allocate optimal resources for Airflow components.

Scheduler Optimization

Increase scheduler performance by setting:
ini
[scheduler] min_file_process_interval = 30 # Reduce processing delay scheduler_heartbeat_sec = 5 # Faster scheduler heartbeat
If DAG scheduling is slow, increase the number of schedulers:
yaml
replicas: 2

Worker Queue Optimization

Airflow allows worker queues to prioritize tasks based on importance. Example:

✔️ How it Helps:

Critical tasks are executed immediately.
Low-priority tasks wait for free resources.

3. Best Practices for Handling Large-Scale Workflows

Scaling Airflow requires efficient DAG design and resource management.

✅ Split Large DAGs into Modular Sub-DAGs

Instead of one monolithic DAG, break it into smaller, manageable DAGs.
Use TriggerDagRunOperator to trigger dependent DAGs.

✅ Use KubernetesExecutor for Task Isolation

Unlike CeleryExecutor, KubernetesExecutor runs each task in a separate pod.
Provides better resource isolation and prevents task failures from affecting others.

✅ Monitor Performance with Airflow Metrics

Use Prometheus and Grafana to track Airflow pod performance.
Set alerts if worker scaling is too slow or DAGs are delayed.

Next Steps

Now that we’ve covered scaling strategies, the next section will focus on monitoring and troubleshooting Airflow on Kubernetes, including log aggregation, alerting, and debugging common deployment issues.

Securing Your Airflow Deployment

Deploying Apache Airflow on Kubernetes introduces security challenges, especially when managing secrets, access control, and authentication.

To ensure a secure setup, follow best practices for secrets management, Role-Based Access Control (RBAC), and web UI authentication.

This section covers:

✅ Managing secrets and environment variables with Kubernetes Secrets

✅ Implementing Role-Based Access Control (RBAC) for Airflow security

✅ Setting up authentication for the Airflow web UI

1. Managing Secrets and Environment Variables with Kubernetes Secrets

Airflow requires sensitive credentials such as database passwords, API keys, and connection details.

Storing these directly in plain text inside Helm values or config files is a security risk. Instead, use Kubernetes Secrets.

Step 1: Create a Kubernetes Secret for Airflow Connections

Save secrets in a YAML file:

✔️ How it Works:

Secrets must be Base64 encoded (echo -n "my_secret_value" | base64).
FERNET_KEY is required for encrypting connections in Airflow.
Store database connection strings securely.

Step 2: Mount Secrets as Environment Variables in Airflow Pods

Modify the values.yaml file for Helm:

Then, apply the update:

✔️ Benefits of Using Kubernetes Secrets:

✅ Prevents hardcoding credentials in Helm or config files

✅ Easier rotation and updating of secrets

✅ Keeps credentials encrypted at rest

2. Implementing Role-Based Access Control (RBAC) for Airflow Security

RBAC ensures that only authorized users can perform actions on Airflow DAGs, connections, and configurations.

Step 1: Enable RBAC in Airflow

Modify values.yaml to enable RBAC:

Step 2: Define Kubernetes RBAC Roles

Create an RBAC policy for Airflow in rbac.yaml:

✔️ How it Works:

Grants Airflow access to manage pods and secrets.
Allows DAG execution by permitting job creation.

Step 3: Bind Roles to Users

Assign roles using RoleBindings:

Apply the RBAC policies:

✔️ Benefits of RBAC in Airflow:

✅ Restricts unauthorized access to critical components

✅ Enables controlled access for different team roles (e.g., Developers vs Admins)

✅ Enhances Kubernetes-native security policies

3. Setting Up Authentication for the Airflow Web UI

By default, Airflow’s web UI does not require authentication, which can be a security risk.

Enforce user authentication using:

Username-password login (built-in auth)
OAuth (Google, GitHub, Okta, etc.)

Option 1: Enabling Built-in Authentication

Modify values.yaml:

Then create a new user:

Option 2: Enabling OAuth for Single Sign-On (SSO)

To use Google OAuth, modify webserver_config.py:

python

from flask_appbuilder.security.manager import AUTH_OAUTH
 AUTH_TYPE = AUTH_OAUTH
 OAUTH_PROVIDERS = [
 {
 'name': 'google',
 'icon': 'fa-google',
 'token_key': 'access_token',
 'whitelist': ['yourcompany.com'],
 'remote_app': {
 'client_id': 'GOOGLE_CLIENT_ID',
 'client_secret': 'GOOGLE_CLIENT_SECRET',
 'api_base_url': 'https://www.googleapis.com/oauth2/v2/',
 'access_token_url': 'https://oauth2.googleapis.com/token',
 'authorize_url': 'https://accounts.google.com/o/oauth2/auth',
 'request_token_url': None,
 'client_kwargs': {'scope': 'email profile'},
 },
 }
 ]

✔️ How it Works:

Requires users to log in via Google before accessing the Airflow UI.
Restricts access to users with an allowed email domain (e.g., yourcompany.com).

Next Steps

Securing your Airflow deployment on Kubernetes ensures that sensitive data remains protected, unauthorized access is restricted, and the system remains resilient to attacks.

The next section will cover monitoring and troubleshooting Airflow on Kubernetes, including log aggregation, performance tuning, and debugging common issues.

Monitoring and Troubleshooting Airflow on Kubernetes

Once Apache Airflow is deployed on Kubernetes, it’s essential to monitor its performance and troubleshoot issues efficiently.

This ensures that DAGs run smoothly, worker pods scale properly, and failures are quickly detected and resolved.

This section covers:

✅ Using Prometheus and Grafana for monitoring Airflow performance

✅ Debugging failed tasks and pod crashes

✅ Common Kubernetes deployment issues and fixes

1. Using Prometheus and Grafana for Monitoring Airflow Performance

Apache Airflow does not provide built-in monitoring dashboards, but you can integrate Prometheus (for metrics collection) and Grafana (for visualization).

Step 1: Install the Prometheus and Grafana Stack

If you don’t have Prometheus installed, deploy it using Helm:

Then install Grafana:

Step 2: Expose Airflow Metrics for Prometheus

Modify values.yaml to enable Prometheus metrics in Airflow:

Apply the update:

Step 3: Add Airflow Dashboards in Grafana

Log in to Grafana (http://<grafana-ip>:3000, default user: admin, pass: admin).
Import the Airflow Dashboard JSON from Grafana’s dashboard repository.
Connect it to the Prometheus data source.

✔️ Key Metrics to Monitor:

✅ DAG run durations (airflow_dag_run_duration_seconds)

✅ Task execution time (airflow_task_duration)

✅ Worker pod CPU and memory usage

✅ Scheduler performance and task queue size

2. Debugging Failed Tasks and Pod Crashes

Failed tasks or pod crashes can disrupt workflows.

Use the following methods to diagnose and resolve Airflow issues.

Step 1: Check Airflow Logs

Get logs from a failed DAG task:

Alternatively, view logs inside the Airflow UI:

Go to “DAGs” → Click on a failed DAG
Click on “Graph View” → Select the failed task
Click “View Log”

Step 2: Restart a Failed Worker Pod

If an Airflow worker pod crashes, restart it:

Kubernetes will automatically create a new pod.

Step 3: Check for Resource Exhaustion

List all running Airflow pods and check their status:

If you see OOMKilled (Out of Memory Killed) errors, increase the worker pod memory in values.yaml:

Apply the changes:

3. Common Kubernetes Deployment Issues and Fixes

Issue	Cause	Fix
DAGs are not updating	Git-Sync is not running properly	Restart Git-Sync sidecar with `kubectl rollout restart deployment airflow-scheduler -n airflow`
Worker pods keep restarting	Insufficient memory allocation	Increase memory requests/limits in `values.yaml`
DAG tasks stuck in “queued” state	Scheduler backlog or missing worker pods	Check scheduler logs (`kubectl logs <scheduler-pod>`), ensure worker pods are running
Database connection errors	Airflow database pod is down	Restart database pod: `kubectl delete pod <db-pod> -n airflow`

Monitoring and troubleshooting are critical for maintaining a stable Airflow deployment on Kubernetes.

By integrating Prometheus and Grafana, tracking logs, and diagnosing common errors, teams can ensure smooth DAG execution and system performance.

Conclusion

Key Takeaways

Deploying Apache Airflow on Kubernetes provides scalability, resource efficiency, and isolation, making it an ideal choice for managing complex workflows.

Throughout this guide, we covered:

✅ Setting up Airflow on Kubernetes using Helm for streamlined deployment.

✅ Managing DAGs and dependencies to keep environments in sync.

✅ Scaling Airflow effectively using Kubernetes autoscaling strategies.

✅ Securing Airflow deployments with RBAC, secrets management, and authentication.

✅ Monitoring and troubleshooting using Prometheus, Grafana, and Kubernetes logs.

By leveraging Kubernetes, teams can automate workflows, dynamically allocate resources, and deploy Airflow in a robust, scalable manner.

Next Steps for Optimizing Airflow on Kubernetes

To further enhance your Airflow deployment, consider:

🚀 Optimizing resource allocation to prevent bottlenecks and maximize efficiency.

🔄 Implementing CI/CD pipelines for DAG updates and automated testing.

🛡️ Enhancing security with fine-grained access control and encrypted configurations.

Additional Resources

For further learning, check out these useful resources:

Official Apache Airflow Documentation

By continuously refining your Airflow on Kubernetes setup, you can streamline workflow automation, improve reliability, and scale efficiently across different environments.

Airflow Deployment on Kubernetes

Why Deploy Airflow on Kubernetes?

Deployment Methods: Manual Setup vs. Helm

Prerequisites for Deploying Airflow on Kubernetes

1. Kubernetes Cluster Setup

2. Install kubectl and Helm

3. Airflow Requirements and Resource Planning

Deploying Airflow on Kubernetes with Helm

1. Introduction to the Official Airflow Helm Chart

2. Installing Airflow Using Helm

Step 1: Add the Apache Airflow Helm Repository

Step 2: Install Airflow with Default Configuration

3. Configuring values.yaml for Custom Deployments

⚙️ Setting Up Executor Type

⚙️ Enabling Persistent Storage for Logs

⚙️ Configuring Database Backend

Deploying with Custom Configuration

Next Steps

Understanding Airflow Components in Kubernetes

1. Web Server Deployment and Service Configuration

Deployment Example (webserver.yaml)

2. Scheduler and Worker Pods

Scheduler

Workers

3. Triggerer and DAG Execution Flow

4. Database Setup with Kubernetes Persistent Volumes

PostgreSQL Deployment Example (postgres.yaml)

Next Steps

Managing DAGs in a Kubernetes Deployment

1. Storing DAGs in a GitHub Repository and Syncing with Kubernetes

Recommended Repository Structure

2. Using Git-Sync or Kubernetes Persistent Volumes for DAG Storage

Option 1: Using Git-Sync to Auto-Update DAGs from GitHub

Option 2: Using Kubernetes Persistent Volumes for DAG Storage

3. Automating DAG Updates with CI/CD

Example: GitHub Actions Workflow for DAG Deployment

Next Steps

Scaling Airflow on Kubernetes

1. Configuring Worker Autoscaling with Kubernetes Horizontal Pod Autoscaler (HPA)

Step 1: Define Resource Requests and Limits for Workers

Step 2: Enable Kubernetes HPA for Airflow Workers

2. Optimizing Resource Allocation for Task Execution

Scheduler Optimization

Worker Queue Optimization

3. Best Practices for Handling Large-Scale Workflows

Next Steps

Securing Your Airflow Deployment

1. Managing Secrets and Environment Variables with Kubernetes Secrets

Step 1: Create a Kubernetes Secret for Airflow Connections

Step 2: Mount Secrets as Environment Variables in Airflow Pods

2. Implementing Role-Based Access Control (RBAC) for Airflow Security

Step 1: Enable RBAC in Airflow

Step 2: Define Kubernetes RBAC Roles

Step 3: Bind Roles to Users

3. Setting Up Authentication for the Airflow Web UI

Option 1: Enabling Built-in Authentication

Option 2: Enabling OAuth for Single Sign-On (SSO)

Next Steps

Monitoring and Troubleshooting Airflow on Kubernetes

1. Using Prometheus and Grafana for Monitoring Airflow Performance

Step 1: Install the Prometheus and Grafana Stack

Step 2: Expose Airflow Metrics for Prometheus

Step 3: Add Airflow Dashboards in Grafana

2. Debugging Failed Tasks and Pod Crashes

Step 1: Check Airflow Logs

Step 2: Restart a Failed Worker Pod

Step 3: Check for Resource Exhaustion

3. Common Kubernetes Deployment Issues and Fixes

Conclusion

Key Takeaways

Next Steps for Optimizing Airflow on Kubernetes

Additional Resources

Be First to Comment

Leave a Reply Cancel reply