onprema

Airflow: XComs

XComs (cross communication) allow the sharing of data between tasks.

There are two main functions that Task objects can call: xcom_push() and xcom_pull()

xcom_push() will write the data
xcom_pull() will read the data

By default, data is written to the Airflow Metadata Database.

Note: this is not a solution for doing ETL within Airflow -- Best practice is to use Airflow as an orchestration tool, and let a remote compute cluster do the large ETL operations. But sometimes it is helpful to pass smaller pieces of data from task to task.

Here is how XComs are represented in the metastore:

SELECT 
    column_name,
    data_type,
    is_nullable,
    column_default,
    character_maximum_length
FROM information_schema.columns 
WHERE table_name = 'xcom'
ORDER BY ordinal_position;
column_name|data_type               |is_nullable|column_default|character_maximum_length|
-----------+------------------------+-----------+--------------+------------------------+
dag_run_id |integer                 |NO         |              |                        |
task_id    |character varying       |NO         |              |                     250|
map_index  |integer                 |NO         |'-1'::integer |                        |
key        |character varying       |NO         |              |                     512|
dag_id     |character varying       |NO         |              |                     250|
run_id     |character varying       |NO         |              |                     250|
value      |bytea                   |YES        |              |                        |
timestamp  |timestamp with time zone|NO         |              |                        |

Limitations

XComs have a limit in size that depends on the database that the metastore is using:

(before Airflow 2.9, you could only share 64Kb)

The data must be JSON serializable

#airflow