Airflow: XComs
XComs (cross communication) allow the sharing of data between tasks.
There are two main functions that Task objects can call: xcom_push()
and xcom_pull()
xcom_push()
will write the data
xcom_pull()
will read the data
By default, data is written to the Airflow Metadata Database.
Note: this is not a solution for doing ETL within Airflow -- Best practice is to use Airflow as an orchestration tool, and let a remote compute cluster do the large ETL operations. But sometimes it is helpful to pass smaller pieces of data from task to task.
Here is how XComs are represented in the metastore:
SELECT
column_name,
data_type,
is_nullable,
column_default,
character_maximum_length
FROM information_schema.columns
WHERE table_name = 'xcom'
ORDER BY ordinal_position;
column_name|data_type |is_nullable|column_default|character_maximum_length|
-----------+------------------------+-----------+--------------+------------------------+
dag_run_id |integer |NO | | |
task_id |character varying |NO | | 250|
map_index |integer |NO |'-1'::integer | |
key |character varying |NO | | 512|
dag_id |character varying |NO | | 250|
run_id |character varying |NO | | 250|
value |bytea |YES | | |
timestamp |timestamp with time zone|NO | | |
Limitations
XComs have a limit in size that depends on the database that the metastore is using:
- SQLite: 2Gb per XCom
- Postgres: 1Gb per XCom
- MySQL: 64Mb per XCom
(before Airflow 2.9, you could only share 64Kb)
The data must be JSON serializable