Set up cross-workspace extraction

Eliminate the need for separate crawler configurations by using a single service principal to crawl metadata from all workspaces within a Databricks metastore. This guide walks you through configuring the necessary permissions to enable cross-workspace extraction.

Important!

Cross-workspace extraction isn't supported for REST API or JDBC extraction methods.

Prerequisites

Before you begin, make sure you have:

A Unity Catalog-enabled Databricks workspace
Account admin access to create and manage service principals
Workspace admin access to grant permissions across all target workspaces
At least one active SQL warehouse in each workspace you intend to crawl
Set up Databricks authentication completed with one of the supported authentication methods
System table extraction enabled for lineage and usage extraction

Add service principal to all workspaces

You must use a single, common service principal that has been granted access to all Databricks workspaces you intend to crawl within the metastore.

Log in to your Databricks account console as an account admin
From the left menu, click Workspaces and select a workspace
From the tabs along the top, click the Permissions tab
In the upper right, click Add permissions
In the Add permissions dialog:
- For User, group, or service principal, select your service principal
- For Permission, select workspace User
- Click Add
Repeat steps 2-5 for each workspace you intend to crawl

Permissions required

The service principal needs the following permissions on each workspace from which the you want Atlan to extract metadata and to enable cross-workspace extraction:

CAN_USE on SQL warehouses in each workspace
USE CATALOG on system catalog
USE SCHEMA on system.access (for cross-workspace discovery)
USE SCHEMA on system.information_schema
SELECT on the following system tables:
- system.access.workspace_latest (for cross-workspace discovery)
- system.information_schema.catalogs
- system.information_schema.schemata
- system.information_schema.tables
- system.information_schema.columns
- system.information_schema.key_column_usage
- system.information_schema.table_constraints
BROWSE on all catalogs you want to crawl

Grant permissions

Configure the necessary permissions for the service principal to access and extract metadata from all workspaces within the metastore.

SQL workspace permissions: The service principal must have usage permissions on at least one active SQL warehouse within each workspace. The extractor uses the smallest available warehouse to run its discovery queries.
- Via SQL
- Via UI
1. Connect to your Databricks workspace using a SQL client or the SQL editor
2. Run the following command for each workspace, replacing the placeholders:
  
  GRANT CAN_USE ON WAREHOUSE <warehouse_name> TO `<service_principal_id>`;
  
  Replace <warehouse_name> with your actual warehouse name
  
  Replace <service_principal_id> with your service principal's application ID
  
  Example
  GRANT CAN_USE ON WAREHOUSE production-warehouse TO `12345678-1234-1234-1234-123456789012`;
1. Log in to your Databricks workspace as a workspace admin
2. From the left menu, click SQL Warehouses
3. On the Compute page, for each SQL warehouse, click the 3-dot icon and then click Permissions
4. In the Manage permissions dialog:
  
  In the Type to add multiple users or groups field, search for and select your service principal
  
  Select Can use permission
  
  Click Add to assign the permission

System table permissions: Access to the system schema is essential for workspace and lineage discovery.

Via SQL
Via UI

Connect to your Databricks workspace using a SQL client or the SQL editor

Grant system catalog access:

GRANT USE CATALOG ON CATALOG system TO `<service_principal_id>`;

Grant schema-level permissions:

GRANT USE SCHEMA ON SCHEMA system.access TO `<service_principal_id>`;
GRANT USE SCHEMA ON SCHEMA system.information_schema TO `<service_principal_id>`;

Grant SELECT permissions on required system tables:

-- For cross-workspace discovery
GRANT SELECT ON TABLE system.access.workspace_latest TO `<service_principal_id>`;

-- For metadata extraction
GRANT SELECT ON TABLE system.information_schema.catalogs TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.schemata TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.tables TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.columns TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.key_column_usage TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.table_constraints TO `<service_principal_id>`;

Replace <service_principal_id> with your service principal's application ID

Example

GRANT USE CATALOG ON CATALOG system TO `12345678-1234-1234-1234-123456789012`;
GRANT USE SCHEMA ON SCHEMA system.access TO `12345678-1234-1234-1234-123456789012`;
GRANT USE SCHEMA ON SCHEMA system.information_schema TO `12345678-1234-1234-1234-123456789012`;
GRANT SELECT ON TABLE system.access.workspace_latest TO `12345678-1234-1234-1234-123456789012`;
GRANT SELECT ON TABLE system.information_schema.catalogs TO `12345678-1234-1234-1234-123456789012`;

Log in to your Databricks workspace as a workspace admin
From the left menu, click Catalog
In the Catalog Explorer, click on the system catalog
Click the Permissions tab and then click Grant
In the Grant permissions dialog:
- Under Principals, select your service principal
- Under Privileges, check USE CATALOG
- Click Grant to apply the permissions
Navigate to system > access
Click the Permissions tab and then click Grant
In the Grant permissions dialog:
- Under Principals, select your service principal
- Under Privileges, check USE SCHEMA
- Click Grant
Repeat for system > information_schema
For each required system table, navigate to the table and grant SELECT permissions:

system.access.workspace_latest
system.information_schema.catalogs
system.information_schema.schemata
system.information_schema.tables
system.information_schema.columns
system.information_schema.key_column_usage
system.information_schema.table_constraints

Asset permissions: The service principal requires BROWSE permissions on all catalogs you want to crawl. BROWSE permission enables the service principal to see and read metadata for all data assets within the catalog, automatically granting access to all schemas and tables.

Important!
For private catalogs, grant permissions from each workspace. For public catalogs, grant from any workspace. Only visible in the system tables when the service principal has BROWSE privileges on individual catalogs.
- Via SQL
- Via UI
1. Connect to your Databricks workspace using a SQL client or the SQL editor
2. Grant BROWSE permissions on each catalog you want to crawl:
  
  GRANT BROWSE ON CATALOG <catalog_name> TO `<service_principal_id>`;
  
  Replace <catalog_name> with your actual catalog name
  
  Replace <service_principal_id> with your service principal's application ID
Example
GRANT BROWSE ON CATALOG main TO `12345678-1234-1234-1234-123456789012`;
1. Log in to your Databricks workspace as a workspace admin
2. From the left menu, click Catalog
3. In the Catalog Explorer, navigate to the catalog you want to grant permissions on (for example, main)
4. Click the Permissions tab and then click Grant
5. In the Grant permissions dialog:
  
  Under Principals, select your service principal
  
  Under Privileges, check BROWSE
  
  Click Grant to apply the permissions
6. Repeat steps 3-5 for each catalog you want to crawl in Atlan

Need help?

Check Cross-workspace extraction setup FAQ for common questions about cross-workspace extraction
Check Troubleshooting cross-workspace extraction issues for common issues
Contact Atlan support for help with setup or integration

Next steps

Crawl Databricks - Set up and run a workflow to extract metadata from your Databricks instance using direct, offline, or agent extraction methods

Prerequisites​

Add service principal to all workspaces​

Permissions required​

Grant permissions​

Need help?​

Next steps​