Airflow: TaskFlow API
Notes on Airflow's TaskFlow API
First of all, the term API in this context is confusing to me. I think of it more like an SDK, or just a different way of writing DAG code. It's not a server, and it's not an additional API beyond the Airflow API. Essentially it's just a way of creating DAGs and Tasks using decorators.
Old way: Manually instantiating operator classes (like writing Kubernetes YAML)
New way: Using @task
to generate the operators for you (like using Helm charts)
What are the main advantages of using the @task
decorator, rather than traditional way of creating tasks with operators like BashOperator
or PythonOperator
?
- It handles XComs automatically. The return value of the decorated function gets pushed to XComs, and you can use the return value as input to another task.
- Type hints allow you to pass around complex Python objects between tasks without manual JSON serialization.
- Less boilerplate code.
How do you define task dependencies using the TaskFlow API?
You just call the decorated function and store the return value as a variable:
order_data = extract()
total_orders = transform(order_data)
load(total_orders, table)
Airflow will understand the flow of dependencies. The transform()
task depends on the extract()
task. So you don't need to write extract >> transform
.
Types of @task
s
Decorator | Underlying Operator |
---|---|
@task |
PythonOperator |
@task.virtualenv |
PythonVirtualEnvOperator |
@task.docker |
DockerOperator |
@task.kubernetes |
KubernetesPodOperator |
@task.bash |
BashOperator |
@task.python |
Alias for @task |
These decorators are provided by providers. Each provider has some metadata, including something like this:
task-decorators:
- class-name: airflow.providers.docker.decorators.docker.docker_task
name: docker
So if you want to find the source code of the decorator, you should look in the provider package (example, docker).
The full flow:
- Airflow starts up
ProvidersManager
scans all installed provider packages- Finds
provider.yaml
files like this Docker one - Reads the
task-decorators
section - Registers
"docker"
→"airflow.providers.docker.decorators.docker.docker_task"
- When you write
@task.docker
, the dynamic lookup finds it - Your function gets wrapped by the real Docker decorator