Skip to main content

Extract Databricks AI model lineage

Supported tracking servers only

Atlan supports lineage only for models tracked on Databricks-hosted MLflow tracking servers (where the tracking URI is databricks). External or self-hosted MLflow tracking servers aren't supported.

Once you have crawled Databricks AI models, Atlan can build lineage connecting those models to the upstream datasets, tables, and functions they depend on. This gives you end-to-end visibility into how data flows from source assets into trained model versions.

Prerequisites

Before extracting AI model lineage, make sure you have:

Extract lineage

Lineage is built automatically during the Databricks crawler run—no separate workflow is needed.

To extract AI model lineage:

  1. Make sure the Databricks crawler is configured for AI models with the Direct extraction strategy.

  2. If your models use Databricks Feature Store and the feature_spec.yaml artifact is stored in an external location, grant read access to the Atlan service account:

    GRANT READ FILES ON EXTERNAL LOCATION <external_location_name> TO <atlan_user_or_role>;

    If the artifact is inaccessible, Atlan skips Feature Store lineage for that model version and falls back to run-based lineage where available.

  3. Run the Databricks crawler workflow.

  4. After the workflow completes, navigate to any AI Model Version asset in Atlan to view its lineage. The lineage graph shows upstream tables, feature views, and functions that fed into the model version.

Cross-workspace lineage for Databricks AI models isn't yet supported. Support for tracing lineage across Databricks workspaces is planned for a future release.

See also