Skip to main content

Crawl Databricks AI models

Atlan can discover and catalog AI models—and their logged versions—registered in the Databricks Unity Catalog Model Registry. Once crawled, model assets are visible in Atlan alongside your other Databricks data assets. Model crawling requires the Direct extraction strategy and isn't supported with the Offline or Agent extraction strategies.

Prerequisites

Before crawling AI models, make sure you have:

Permissions required

In addition to the standard Databricks connector permissions, the Atlan service account requires:

  • Data Reader preset (or the individual privileges USE CATALOG, USE SCHEMA, EXECUTE, READ VOLUME, and SELECT) on all catalogs and schemas containing models
  • CAN VIEW or CAN READ on all user notebooks and MLflow experiments linked to model versions. To cover all model versions without granting access notebook by notebook, grant CAN VIEW at the workspace level.
  • READ FILES on any external location storing feature_spec.yaml artifacts, if your workspace uses Databricks Feature Store models with externally stored artifacts:
GRANT READ FILES ON EXTERNAL LOCATION <external_location_name> TO <atlan_user_or_role>;

For the full breakdown of each privilege and what it enables, see Permissions for Databricks AI models.

Configure crawler

To configure the crawler for AI models:

  1. Follow the standard Crawl Databricks steps.
  2. When selecting the extraction strategy, choose Direct.
  3. For the extraction method, select System Tables. The REST API method is deprecated—use System Tables instead. System Tables supports all authentication types: personal access token, AWS service principal, and Azure service principal.
  4. Under asset filters, specify which catalogs or schemas to crawl:
    • To include specific catalogs or schemas, click Include Metadata.
    • To exclude specific catalogs or schemas, click Exclude Metadata.
    • If no filters are set, Atlan crawls all catalogs and schemas accessible to the service account.
  5. Run the workflow.

After the workflow completes, AI Model and AI Model Version assets appear in Atlan under the crawled catalog and schema.

See also