Skip to main content

What does Atlan crawl from Apache Spark/OpenLineage?

Once you have integrated Apache Spark/OpenLineage, you can use connector-specific filters for quick asset discovery. The following filters are currently supported:

  • Status filter - last run status for an asset
  • Duration filter - last run duration for an asset

Atlan maps the following assets and properties from Apache Spark/OpenLineage. Asset lineage support depends on the data sources that OpenLineage supports.

Jobs

Atlan maps jobs from Apache Spark to its SparkJob asset type. Atlan also supports column-level lineage for Spark jobs.

Source propertyAtlan propertyDescription
job.namenameName of the Spark job
-qualifiedNameUnique identifier for the job in Atlan
Derived from job.namesparkAppNameName of the Spark application (substring before first '.')
spark.mastersparkMasterSpark master URL (for example, yarn, local, and more.)
-connectionQualifiedNameUnique identifier for the connector instance
-connectorNameName of the connector instance

OpenLineage metadata

Atlan reports OpenLineage operational metadata for Spark jobs.

SourceAtlan propertyDescription
run.runIdsparkRunIdUnique run identifier
run.facets.spark_version.spark-versionsparkRunVersionSpark runtime version
run.facets.spark_version.openlineage-spark-versionsparkRunOpenLineageVersionOpenLineage library version
START event timestampsparkRunStartTimeJob start time
COMPLETE/ABORT/FAIL event timestampsparkRunEndTimeJob end time
Final event typesparkRunOpenLineageStateStatus of the job (COMPLETE, FAIL, ABORT)