Airflow: DAG Versioning and Bundles
What problem(s) it solves
People change their DAGs sometimes. Airflow (used to) use the most recent version of the DAG code, and assumes the most recent code applies to all past runs. This can be very confusing.
DAG code updates during a run can cause tasks to execute different code versions.
Implementation
Airflow 3 automatically keeps track of changes in your DAGs.
A "change" is when the change affects SerDAG.
What is SerDAG?
When you define a DAG in Python, Airflow needs to store a representation of it (tasks, dependencies, configuration) in the database. It doesn't store pure Python code in the database. It needs to be serialized into JSON. Serialization is done by the scheduler.
[...]the
DagFileProcessorProcess
in the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB asSerializedDagModel
model.
What is considered a change?
Airflow makes a new version of the DAG if...
- tasks are added or removed
- task ids are changed
- modifying task types (
BashOperator
->PythonOperator
, for example) - changing task dependencies / relationships
- modifying task configs that affect execution (
retries
,retry_delay
,pool
, etc) - dag parameters change (
schedule
,catchup
,max_active_runs
, etc) - modifying dag-level configurations
- adding or removing dag tags
- changes to taskgroup structures
- changing the task dependency graph
Airflow doesn't make a new version if...
- python function code changes
- changing the bash command in the
BashOperator
, for example - updating sql queries in the
SqlOperator
- changes of business logic inside task functions
- dag description changes or comments are added
- variable names in python code
DAG Bundles
DAG bundles are a collection of files containing DAG code and supporting files. DAG bundles are named after the backend they use to store the DAG code. It is the mechanism that enables DAG versioning.
They can be versioned or unversioned.
The default DAG bundle is the LocalDagBundle
and is not versioned. There is also GitDagBundle
which is versioned. Other bundles coming in the future. Kinda sounds like OCI.
select * from dag_bundle;
//
name |active|version|last_refreshed |
-----------+------+-------+-----------------------------+
dags-folder|true | |2025-08-11 10:43:15.884 -0400|
To get the `GitDagBundle` up and running you would need to:
- push dag code to github
- install git package (`apt install git`)
- install airflow git provider
- define a git airflow connection
- configure dag bundle in airflow configs (`AIRFLOW__DAG_PROCESSOR__DAG_BUNDLE_CONFIG_LIST`)
## Notes
- you can still mess things up if you change a DAG that is running, even with versioning, so it would probably be best to pause a critical dag while the update is deploying.