What does Atlan crawl from Apache Spark/OpenLineage?
Once you have integrated Apache Spark/OpenLineage, you can use connector-specific filters for quick asset discovery. The following filters are currently supported:
- Status filter - last run status for an asset
- Duration filter - last run duration for an asset
Atlan maps the following assets and properties from Apache Spark/OpenLineage. Asset lineage support depends on the data sources that OpenLineage supports.
Jobs
Atlan maps jobs from Apache Spark to its SparkJob
asset type. Atlan also supports column-level lineage for Spark jobs.
Source property | Atlan property | Description |
---|---|---|
job.name | name | Name of the Spark job |
- | qualifiedName | Unique identifier for the job in Atlan |
Derived from job.name | sparkAppName | Name of the Spark application (substring before first '.') |
spark.master | sparkMaster | Spark master URL (for example, yarn, local, and more.) |
- | connectionQualifiedName | Unique identifier for the connector instance |
- | connectorName | Name of the connector instance |
OpenLineage metadata
Atlan reports OpenLineage operational metadata for Spark jobs.
Source | Atlan property | Description |
---|---|---|
run.runId | sparkRunId | Unique run identifier |
run.facets.spark_version.spark-version | sparkRunVersion | Spark runtime version |
run.facets.spark_version.openlineage-spark-version | sparkRunOpenLineageVersion | OpenLineage library version |
START event timestamp | sparkRunStartTime | Job start time |
COMPLETE /ABORT /FAIL event timestamp | sparkRunEndTime | Job end time |
Final event type | sparkRunOpenLineageState | Status of the job (COMPLETE, FAIL, ABORT) |