Skip to main content

Set up cross-workspace extraction

Eliminate the need for separate crawler configurations by using a single service principal to crawl metadata from all workspaces within a Databricks metastore. This guide walks you through configuring the necessary permissions to enable cross-workspace extraction.

Important!

Cross-workspace extraction isn't supported for REST API or JDBC extraction methods.

Prerequisites

Before you begin, make sure you have:

  • A Unity Catalog-enabled Databricks workspace
  • Account admin access to create and manage service principals
  • Workspace admin access to grant permissions across all target workspaces
  • At least one active SQL warehouse in each workspace you intend to crawl
  • Set up Databricks authentication completed with one of the supported authentication methods
  • System table extraction enabled for lineage and usage extraction

Add service principal to all workspaces

You must use a single, common service principal that has been granted access to all Databricks workspaces you intend to crawl within the metastore.

  1. Log in to your Databricks account console as an account admin
  2. From the left menu, click Workspaces and select a workspace
  3. From the tabs along the top, click the Permissions tab
  4. In the upper right, click Add permissions
  5. In the Add permissions dialog:
    • For User, group, or service principal, select your service principal
    • For Permission, select workspace User
    • Click Add
  6. Repeat steps 2-5 for each workspace you intend to crawl

Permissions required

The service principal needs the following permissions on each workspace from which the you want Atlan to extract metadata and to enable cross-workspace extraction:

  • CAN_USE on SQL warehouses in each workspace
  • USE CATALOG on system catalog
  • USE SCHEMA on system.access (for cross-workspace discovery)
  • USE SCHEMA on system.information_schema
  • SELECT on the following system tables:
    • system.access.workspace_latest (for cross-workspace discovery)
    • system.information_schema.catalogs
    • system.information_schema.schemata
    • system.information_schema.tables
    • system.information_schema.columns
    • system.information_schema.key_column_usage
    • system.information_schema.table_constraints
  • BROWSE on all catalogs you want to crawl

Grant permissions

Configure the necessary permissions for the service principal to access and extract metadata from all workspaces within the metastore.

  1. SQL workspace permissions: The service principal must have usage permissions on at least one active SQL warehouse within each workspace. The extractor uses the smallest available warehouse to run its discovery queries.

    1. Connect to your Databricks workspace using a SQL client or the SQL editor

    2. Run the following command for each workspace, replacing the placeholders:

      GRANT CAN_USE ON WAREHOUSE <warehouse_name> TO `<service_principal_id>`;
      • Replace <warehouse_name> with your actual warehouse name
      • Replace <service_principal_id> with your service principal's application ID
      Example
      GRANT CAN_USE ON WAREHOUSE production-warehouse TO `12345678-1234-1234-1234-123456789012`;
  2. System table permissions: Access to the system schema is essential for workspace and lineage discovery.

    1. Connect to your Databricks workspace using a SQL client or the SQL editor

    2. Grant system catalog access:

      GRANT USE CATALOG ON CATALOG system TO `<service_principal_id>`;
    3. Grant schema-level permissions:

      GRANT USE SCHEMA ON SCHEMA system.access TO `<service_principal_id>`;
      GRANT USE SCHEMA ON SCHEMA system.information_schema TO `<service_principal_id>`;
    4. Grant SELECT permissions on required system tables:

      -- For cross-workspace discovery
      GRANT SELECT ON TABLE system.access.workspace_latest TO `<service_principal_id>`;

      -- For metadata extraction
      GRANT SELECT ON TABLE system.information_schema.catalogs TO `<service_principal_id>`;
      GRANT SELECT ON TABLE system.information_schema.schemata TO `<service_principal_id>`;
      GRANT SELECT ON TABLE system.information_schema.tables TO `<service_principal_id>`;
      GRANT SELECT ON TABLE system.information_schema.columns TO `<service_principal_id>`;
      GRANT SELECT ON TABLE system.information_schema.key_column_usage TO `<service_principal_id>`;
      GRANT SELECT ON TABLE system.information_schema.table_constraints TO `<service_principal_id>`;
      • Replace <service_principal_id> with your service principal's application ID
    Example
    GRANT USE CATALOG ON CATALOG system TO `12345678-1234-1234-1234-123456789012`;
    GRANT USE SCHEMA ON SCHEMA system.access TO `12345678-1234-1234-1234-123456789012`;
    GRANT USE SCHEMA ON SCHEMA system.information_schema TO `12345678-1234-1234-1234-123456789012`;
    GRANT SELECT ON TABLE system.access.workspace_latest TO `12345678-1234-1234-1234-123456789012`;
    GRANT SELECT ON TABLE system.information_schema.catalogs TO `12345678-1234-1234-1234-123456789012`;
  3. Asset permissions: The service principal requires BROWSE permissions on all catalogs you want to crawl. BROWSE permission enables the service principal to see and read metadata for all data assets within the catalog, automatically granting access to all schemas and tables.

    Important!

    For private catalogs, grant permissions from each workspace. For public catalogs, grant from any workspace. Only visible in the system tables when the service principal has BROWSE privileges on individual catalogs.

    1. Connect to your Databricks workspace using a SQL client or the SQL editor

    2. Grant BROWSE permissions on each catalog you want to crawl:

      GRANT BROWSE ON CATALOG <catalog_name> TO `<service_principal_id>`;
      • Replace <catalog_name> with your actual catalog name
      • Replace <service_principal_id> with your service principal's application ID
    Example
    GRANT BROWSE ON CATALOG main TO `12345678-1234-1234-1234-123456789012`;

Need help?

Next steps

  • Crawl Databricks - Set up and run a workflow to extract metadata from your Databricks instance using direct, offline, or agent extraction methods