Skip to main content

What does Atlan crawl from Apache Airflow/OpenLineage?

Once you have integrated Apache Airflow/OpenLineage, you can use connector-specific filters for quick asset discovery. The following filters are currently supported:

  • Status filter - last run status for an asset
  • Duration filter - last run duration for an asset

Atlan maps the following assets and properties from Apache Airflow/OpenLineage. Asset lineage support depends on the list of operators supported by OpenLineage.

DAGs

Atlan maps DAGs (directed acyclic graphs) from Apache Airflow/OpenLineage to its AirflowDAG asset type.

Source propertyAtlan propertyDescription
job.namenameName of the Airflow DAG
-qualifiedNameUnique identifier for the DAG in Atlan
descriptiondescriptionDescription of the DAG from Airflow
ownerssourceOwnersOriginal owner information from Airflow
-ownerUsersValidated Atlan usernames (mapped from source owners)
schedule_intervalairflowDagScheduleDAG's schedule interval (cron expression or preset)
deltaairflowDagScheduleDeltaSchedule interval in seconds
tagsairflowTagsTags assigned to the DAG
run_idairflowRunNameUnique identifier for the DAG run
run_typeairflowRunTypeType of run (scheduled, manual, backfill)
eventTime (start)airflowRunStartTimeTimestamp when the DAG run started
eventTime (end)airflowRunEndTimeTimestamp when the DAG run completed
eventTypeairflowRunOpenLineageStateFinal status of the DAG run
versionairflowRunVersionAirflow version
openlineageAdapterVersionairflowRunOpenLineageVersionOpenLineage adapter version
-sourceURLDirect link to the DAG in Airflow UI
-connectionNameName of the connector instance
-connectionQualifiedNameUnique identifier for the connector instance
-connectorNameName of the connector type
Did you know?

If a DAG has more than 10 valid owner email addresses (comma-separated), only the first 10 will be captured and published.

Tasks

Atlan maps tasks from Apache Airflow/OpenLineage to its AirflowTask asset type.

Source propertyAtlan propertyDescription
job.name (partial)nameName of the task (extracted from full job name)
-qualifiedNameUnique identifier for the task in Atlan
-airflowDagNameName of the parent DAG
-airflowDagQualifiedNameUnique identifier for the parent DAG in Atlan
operator_classairflowTaskOperatorClassType of operator used for the task
conn_idairflowTaskConnectionIdConnection ID used by the task
sqlairflowTaskSqlSQL query (for SQL-based operators)
ownersourceOwnersOwner information from the task definition
eventTime (start)airflowRunStartTimeTimestamp when the task started
eventTime (end)airflowRunEndTimeTimestamp when the task completed
eventTypeairflowRunOpenLineageStateFinal status of the task run
run_idairflowRunNameUnique identifier for the task run
run_typeairflowRunTypeType of run (from parent DAG)
poolairflowTaskPoolWorker pool assigned to the task
pool_slotsairflowTaskPoolSlotsNumber of pool slots used by the task
priority_weightairflowTaskPriorityWeightPriority weight for execution order
queue