Set up cross-workspace extraction
Eliminate the need for separate crawler configurations by using a single service principal to crawl metadata from all workspaces within a Databricks metastore. This guide walks you through configuring the necessary permissions to enable cross-workspace extraction.
Cross-workspace extraction isn't supported for REST API or JDBC extraction methods.
Prerequisites
Before you begin, make sure you have:
- A Unity Catalog-enabled Databricks workspace
- Account admin access to create and manage service principals
- Workspace admin access to grant permissions across all target workspaces
- At least one active SQL warehouse in each workspace you intend to crawl
- Set up Databricks authentication completed with one of the supported authentication methods
- System table extraction enabled for lineage and usage extraction
Add service principal to all workspaces
You must use a single, common service principal that has been granted access to all Databricks workspaces you intend to crawl within the metastore.
- Log in to your Databricks account console as an account admin
- From the left menu, click Workspaces and select a workspace
- From the tabs along the top, click the Permissions tab
- In the upper right, click Add permissions
- In the Add permissions dialog:
- For User, group, or service principal, select your service principal
- For Permission, select workspace User
- Click Add
- Repeat steps 2-5 for each workspace you intend to crawl
Permissions required
The service principal needs the following permissions on each workspace from which the you want Atlan to extract metadata and to enable cross-workspace extraction:
CAN_USE
on SQL warehouses in each workspaceUSE CATALOG
onsystem
catalogUSE SCHEMA
onsystem.access
(for cross-workspace discovery)USE SCHEMA
onsystem.information_schema
SELECT
on the following system tables:system.access.workspace_latest
(for cross-workspace discovery)system.information_schema.catalogs
system.information_schema.schemata
system.information_schema.tables
system.information_schema.columns
system.information_schema.key_column_usage
system.information_schema.table_constraints
BROWSE
on all catalogs you want to crawl
Grant permissions
Configure the necessary permissions for the service principal to access and extract metadata from all workspaces within the metastore.
-
SQL workspace permissions: The service principal must have usage permissions on at least one active SQL warehouse within each workspace. The extractor uses the smallest available warehouse to run its discovery queries.
- Via SQL
- Via UI
-
Connect to your Databricks workspace using a SQL client or the SQL editor
-
Run the following command for each workspace, replacing the placeholders:
GRANT CAN_USE ON WAREHOUSE <warehouse_name> TO `<service_principal_id>`;
- Replace
<warehouse_name>
with your actual warehouse name - Replace
<service_principal_id>
with your service principal's application ID
Example
GRANT CAN_USE ON WAREHOUSE production-warehouse TO `12345678-1234-1234-1234-123456789012`;
- Replace
- Log in to your Databricks workspace as a workspace admin
- From the left menu, click SQL Warehouses
- On the Compute page, for each SQL warehouse, click the 3-dot icon and then click Permissions
- In the Manage permissions dialog:
- In the Type to add multiple users or groups field, search for and select your service principal
- Select Can use permission
- Click Add to assign the permission
-
System table permissions: Access to the system schema is essential for workspace and lineage discovery.
- Via SQL
- Via UI
-
Connect to your Databricks workspace using a SQL client or the SQL editor
-
Grant system catalog access:
GRANT USE CATALOG ON CATALOG system TO `<service_principal_id>`;
-
Grant schema-level permissions:
GRANT USE SCHEMA ON SCHEMA system.access TO `<service_principal_id>`;
GRANT USE SCHEMA ON SCHEMA system.information_schema TO `<service_principal_id>`; -
Grant SELECT permissions on required system tables:
-- For cross-workspace discovery
GRANT SELECT ON TABLE system.access.workspace_latest TO `<service_principal_id>`;
-- For metadata extraction
GRANT SELECT ON TABLE system.information_schema.catalogs TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.schemata TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.tables TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.columns TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.key_column_usage TO `<service_principal_id>`;
GRANT SELECT ON TABLE system.information_schema.table_constraints TO `<service_principal_id>`;- Replace
<service_principal_id>
with your service principal's application ID
- Replace
Example
GRANT USE CATALOG ON CATALOG system TO `12345678-1234-1234-1234-123456789012`;
GRANT USE SCHEMA ON SCHEMA system.access TO `12345678-1234-1234-1234-123456789012`;
GRANT USE SCHEMA ON SCHEMA system.information_schema TO `12345678-1234-1234-1234-123456789012`;
GRANT SELECT ON TABLE system.access.workspace_latest TO `12345678-1234-1234-1234-123456789012`;
GRANT SELECT ON TABLE system.information_schema.catalogs TO `12345678-1234-1234-1234-123456789012`;-
Log in to your Databricks workspace as a workspace admin
-
From the left menu, click Catalog
-
In the Catalog Explorer, click on the system catalog
-
Click the Permissions tab and then click Grant
-
In the Grant permissions dialog:
- Under Principals, select your service principal
- Under Privileges, check
USE CATALOG
- Click Grant to apply the permissions
-
Navigate to system > access
-
Click the Permissions tab and then click Grant
-
In the Grant permissions dialog:
- Under Principals, select your service principal
- Under Privileges, check
USE SCHEMA
- Click Grant
-
Repeat for system > information_schema
-
For each required system table, navigate to the table and grant
SELECT
permissions:
system.access.workspace_latest
system.information_schema.catalogs
system.information_schema.schemata
system.information_schema.tables
system.information_schema.columns
system.information_schema.key_column_usage
system.information_schema.table_constraints
-
Asset permissions: The service principal requires BROWSE permissions on all catalogs you want to crawl. BROWSE permission enables the service principal to see and read metadata for all data assets within the catalog, automatically granting access to all schemas and tables.
Important!For private catalogs, grant permissions from each workspace. For public catalogs, grant from any workspace. Only visible in the system tables when the service principal has BROWSE privileges on individual catalogs.
- Via SQL
- Via UI
-
Connect to your Databricks workspace using a SQL client or the SQL editor
-
Grant BROWSE permissions on each catalog you want to crawl:
GRANT BROWSE ON CATALOG <catalog_name> TO `<service_principal_id>`;
- Replace
<catalog_name>
with your actual catalog name - Replace
<service_principal_id>
with your service principal's application ID
- Replace
Example
GRANT BROWSE ON CATALOG main TO `12345678-1234-1234-1234-123456789012`;
- Log in to your Databricks workspace as a workspace admin
- From the left menu, click Catalog
- In the Catalog Explorer, navigate to the catalog you want to grant permissions on (for example,
main
) - Click the Permissions tab and then click Grant
- In the Grant permissions dialog:
- Under Principals, select your service principal
- Under Privileges, check BROWSE
- Click Grant to apply the permissions
- Repeat steps 3-5 for each catalog you want to crawl in Atlan
Need help?
- Check Cross-workspace extraction setup FAQ for common questions about cross-workspace extraction
- Check Troubleshooting cross-workspace extraction issues for common issues
- Contact Atlan support for help with setup or integration
Next steps
- Crawl Databricks - Set up and run a workflow to extract metadata from your Databricks instance using direct, offline, or agent extraction methods