Airflow: TaskGroups
What problem(s) it solves
Large DAGs with 50+ individual tasks becomes hard to manage, hard to visualize in the UI, and lead to lots of duplication.
If Airflow tasks are like the messy collection of files on your desktop, then TaskGroups are like folders that organize and contain tasks in a particular way.
Implementation
Two main approaches:
@task_group
decorator
@task_group(group_id="data_processing")
def process_data():
task1 = Operator(task_id='foo')
task2 = Operator(task_id='bar')
task1 >> task2
process_data() # -> must call the function to create the group!
TaskGroup
context manager:
with TaskGroup(group_id="foo") as tg:
task1 = Operator(task_id='bar')
task2 = Operator(task_id='baz')
task1 >> task2
Examples:
Notes
- TaskGroups are on a per-DAG level. You can't group tasks across different DAGs.
- TaskGroups integrate with the TaskFlow API so you can pass data from taskgroup to taskgroup using xcoms