onprema

Airflow: DAG Versioning and Bundles

What problem(s) it solves

People change their DAGs sometimes. Airflow (used to) use the most recent version of the DAG code, and assumes the most recent code applies to all past runs. This can be very confusing.

DAG code updates during a run can cause tasks to execute different code versions.

Implementation

Airflow 3 automatically keeps track of changes in your DAGs.

A "change" is when the change affects SerDAG.

What is SerDAG?

When you define a DAG in Python, Airflow needs to store a representation of it (tasks, dependencies, configuration) in the database. It doesn't store pure Python code in the database. It needs to be serialized into JSON. Serialization is done by the scheduler.

[...]the DagFileProcessorProcess in the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB as SerializedDagModel model.

What is considered a change?

Airflow makes a new version of the DAG if...

Airflow doesn't make a new version if...

DAG Bundles

DAG bundles are a collection of files containing DAG code and supporting files. DAG bundles are named after the backend they use to store the DAG code. It is the mechanism that enables DAG versioning.

They can be versioned or unversioned.

The default DAG bundle is the LocalDagBundle and is not versioned. There is also GitDagBundle which is versioned. Other bundles coming in the future. Kinda sounds like OCI.

select * from dag_bundle;
//
name       |active|version|last_refreshed               |
-----------+------+-------+-----------------------------+
dags-folder|true  |       |2025-08-11 10:43:15.884 -0400|

To get the `GitDagBundle` up and running you would need to:
- push dag code to github
- install git package (`apt install git`)
- install airflow git provider
- define a git airflow connection
- configure dag bundle in airflow configs (`AIRFLOW__DAG_PROCESSOR__DAG_BUNDLE_CONFIG_LIST`)

## Notes
- you can still mess things up if you change a DAG that is running, even with versioning, so it would probably be best to pause a critical dag while the update is deploying.

#airflow