Crawl Databricks AI models
Atlan can discover and catalog AI models—and their logged versions—registered in the Databricks Unity Catalog Model Registry. Once crawled, model assets are visible in Atlan alongside your other Databricks data assets. Model crawling requires the Direct extraction strategy and isn't supported with the Offline or Agent extraction strategies.
Prerequisites
Before crawling AI models, make sure you have:
- A Unity Catalog-enabled Databricks workspace
- Set up the Databricks connector in Atlan
- Crawled Databricks assets at least once
Permissions required
In addition to the standard Databricks connector permissions, the Atlan service account requires:
- Data Reader preset (or the individual privileges
USE CATALOG,USE SCHEMA,EXECUTE,READ VOLUME, andSELECT) on all catalogs and schemas containing models - CAN VIEW or CAN READ on all user notebooks and MLflow experiments linked to model versions. To cover all model versions without granting access notebook by notebook, grant CAN VIEW at the workspace level.
READ FILESon any external location storingfeature_spec.yamlartifacts, if your workspace uses Databricks Feature Store models with externally stored artifacts:
GRANT READ FILES ON EXTERNAL LOCATION <external_location_name> TO <atlan_user_or_role>;
For the full breakdown of each privilege and what it enables, see Permissions for Databricks AI models.
Configure crawler
To configure the crawler for AI models:
- Follow the standard Crawl Databricks steps.
- When selecting the extraction strategy, choose Direct.
- For the extraction method, select System Tables. The REST API method is deprecated—use System Tables instead. System Tables supports all authentication types: personal access token, AWS service principal, and Azure service principal.
- Under asset filters, specify which catalogs or schemas to crawl:
- To include specific catalogs or schemas, click Include Metadata.
- To exclude specific catalogs or schemas, click Exclude Metadata.
- If no filters are set, Atlan crawls all catalogs and schemas accessible to the service account.
- Run the workflow.
After the workflow completes, AI Model and AI Model Version assets appear in Atlan under the crawled catalog and schema.
See also
- Extract Databricks AI model lineage: Build lineage between your Databricks models and the upstream data assets they depend on.
- What does Atlan crawl from Databricks: Full reference of all Databricks assets and properties crawled by Atlan, including AI models.