Skip to main content

Set up cross-workspace extraction

Eliminate the need for separate crawler configurations by using a single service principal to crawl metadata from all workspaces within a Databricks metastore. This guide walks you through configuring the necessary permissions to enable cross-workspace extraction.

Important!

Cross-workspace extraction isn't supported for REST API or JDBC extraction methods.

Prerequisites

Before you begin, make sure you have:

  • A Unity Catalog-enabled Databricks workspace
  • Account admin access to create and manage service principals
  • Workspace admin access to grant permissions across all target workspaces
  • At least one active SQL warehouse in each workspace you intend to crawl
  • Set up Databricks authentication completed with one of the supported authentication methods
  • System table extraction enabled for lineage and usage extraction

Permissions required

The service principal needs the following permissions to enable cross-workspace extraction:

  • CAN_USE on SQL warehouses in each workspace
  • SELECT on system.access.workspace_latest table
  • USE CATALOG, BROWSE, and SELECT on all catalogs you want to crawl

Add service principal to all workspaces

You must use a single, common service principal that has been granted access to all Databricks workspaces you intend to crawl within the metastore.

  1. Log in to your Databricks account console as an account admin
  2. From the left menu, click Workspaces and select a workspace
  3. From the tabs along the top, click the Permissions tab
  4. In the upper right, click Add permissions
  5. In the Add permissions dialog:
    • For User, group, or service principal, select your service principal
    • For Permission, select workspace User
    • Click Add
  6. Repeat steps 2-5 for each workspace you intend to crawl

Grant permissions

Configure the necessary permissions for the service principal to access and extract metadata from all workspaces within the metastore.

  1. SQL workspace permissions: The service principal must have usage permissions on at least one active SQL warehouse within each workspace. The extractor uses the smallest available warehouse to run its discovery queries.

    1. Connect to your Databricks workspace using a SQL client or the SQL editor

    2. Run the following command for each workspace, replacing the placeholders:

      GRANT CAN_USE ON WAREHOUSE <warehouse_name> TO `<service_principal_id>`;
      • Replace <warehouse_name> with your actual warehouse name
      • Replace <service_principal_id> with your service principal's application ID
      Example
      GRANT CAN_USE ON WAREHOUSE production-warehouse TO `12345678-1234-1234-1234-123456789012`;
  2. System table permissions: Access to the system schema is essential for workspace and lineage discovery.

    1. Connect to your Databricks workspace using a SQL client or the SQL editor

    2. Run the following command, replacing the placeholder:

      GRANT SELECT ON TABLE system.access.workspace_latest TO `<service_principal_id>`;
      • Replace <service_principal_id> with your service principal's application ID
      Example
      GRANT SELECT ON TABLE system.access.workspace_latest TO `12345678-1234-1234-1234-123456789012`;
  3. Asset permissions: The service principal requires permissions to "see" and "read" the metadata for all data assets you wish to extract. These grants must be applied to all private, public, and shared catalogs that are in scope for crawling.

    Important!

    For private catalogs, grant permissions from each workspace. For public catalogs, grant from any workspace.

    1. Connect to your Databricks workspace using a SQL client or the SQL editor

    2. Grant catalog-level permissions (required even when using BROWSE - BROWSE automatically grants access to all schemas and tables):

      GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service_principal_id>`;
      GRANT BROWSE ON CATALOG <catalog_name> TO `<service_principal_id>`;
      • Replace <catalog_name> with your actual catalog name
      • Replace <service_principal_id> with your service principal's application ID
    3. If not using BROWSE, along with catalog permissions, grant additional permissions:

      • Grant schema-level permissions:

        GRANT USE SCHEMA ON SCHEMA <catalog_name>.<schema_name> TO `<service_principal_id>`;
        • Replace <catalog_name> and <schema_name> with your actual values
        • Replace <service_principal_id> with your service principal's application ID
      • Grant table-level permissions:

        GRANT SELECT ON ALL TABLES IN SCHEMA <catalog_name>.<schema_name> TO `<service_principal_id>`;
        • Replace <catalog_name> and <schema_name> with your actual values
        • Replace <service_principal_id> with your service principal's application ID
    Example
    GRANT USE CATALOG ON CATALOG main TO `12345678-1234-1234-1234-123456789012`;
    GRANT BROWSE ON CATALOG main TO `12345678-1234-1234-1234-123456789012`;

Need help?

Next steps