How Atlan builds lineage for Databricks AI models
This FAQ covers how Atlan constructs lineage for Databricks AI models, what relationships are created, and how assets like notebooks and functions are handled.
How does Atlan build lineage for Databricks AI models?
Atlan uses two sources to construct lineage for each model version:
- Run details: For model versions with an associated MLflow run, Atlan reads the input datasets logged to that run. Tables referenced as inputs are linked as upstream dependencies of the model version.
- Feature store artifacts: For feature engineering models, Atlan reads the
feature_spec.yamlartifact to discover upstream feature views, tables, and functions used during training.
What lineage relationships does Atlan create for AI models?
| Upstream asset | Downstream asset | Source |
|---|---|---|
| Table | AI model version | Input datasets in MLflow run details |
| Table | AI model version | feature_spec.yaml artifact |
| Function | AI model version | feature_spec.yaml artifact |
Why do I see Notebook and Function assets appear in Atlan after lineage extraction?
When Atlan encounters a notebook or function during lineage extraction, it creates a corresponding Notebook or Function asset in Atlan so those assets can be navigated and governed alongside other data assets.
Why do notebook and job details appear in lineage without creating asset-to-asset edges?
Yes. When a model version is linked to a Databricks notebook, job, or project, Atlan captures those details as additional ETL context on the lineage process entity. This context is visible in the lineage panel but doesn't create separate asset-to-asset lineage edges.
See also
- Extract Databricks AI model lineage: Configure and run lineage extraction for Databricks AI models.
- Permissions for Databricks AI models: Full reference of the privileges required for AI model crawling and lineage extraction.