onprema

Airflow: TaskFlow API

Notes on Airflow's TaskFlow API


First of all, the term API in this context is confusing to me. I think of it more like an SDK, or just a different way of writing DAG code. It's not a server, and it's not an additional API beyond the Airflow API. Essentially it's just a way of creating DAGs and Tasks using decorators.

Old way: Manually instantiating operator classes (like writing Kubernetes YAML)
New way: Using @task to generate the operators for you (like using Helm charts)


What are the main advantages of using the @task decorator, rather than traditional way of creating tasks with operators like BashOperator or PythonOperator?


How do you define task dependencies using the TaskFlow API?

You just call the decorated function and store the return value as a variable:

order_data = extract()
total_orders = transform(order_data)
load(total_orders, table)

Airflow will understand the flow of dependencies. The transform() task depends on the extract() task. So you don't need to write extract >> transform.


Types of @tasks

Decorator Underlying Operator
@task PythonOperator
@task.virtualenv PythonVirtualEnvOperator
@task.docker DockerOperator
@task.kubernetes KubernetesPodOperator
@task.bash BashOperator
@task.python Alias for @task

These decorators are provided by providers. Each provider has some metadata, including something like this:

task-decorators:
  - class-name: airflow.providers.docker.decorators.docker.docker_task
    name: docker

So if you want to find the source code of the decorator, you should look in the provider package (example, docker).

The full flow:

  1. Airflow starts up
  2. ProvidersManager scans all installed provider packages
  3. Finds provider.yaml files like this Docker one
  4. Reads the task-decorators section
  5. Registers "docker" → "airflow.providers.docker.decorators.docker.docker_task"
  6. When you write @task.docker, the dynamic lookup finds it
  7. Your function gets wrapped by the real Docker decorator

#airflow