Cross-workspace extraction
This FAQ addresses common questions about setting up and configuring cross-workspace extraction for Databricks. Cross-workspace extraction enables you to use a single service principal to crawl metadata from all workspaces within a Databricks metastore, eliminating the need for separate connections.
Why do you need cross-workspace extraction?
If a user has multiple workspaces under the same metastore in their Databricks environment, this feature eliminates the need to set up separate Databricks connections for each workspace. Instead, a single connection can extract metadata across all available workspaces present in a metastore.
What are public and private catalogs?
- Public catalogs are available from all workspaces within a metastore.
- Private catalogs are restricted to specific workspaces and aren't available across the entire metastore.
If you have workspaces in different metastores, can one cross-workspace setup handle all of them, or do you need separate configurations?
One cross-workspace setup extracts metadata only from the workspaces within a single metastore. The metastore used for extraction is determined by the metastore of the originally configured workspace used while setting up the Databricks crawler.
What happens if you add new workspaces to your metastore? Are they automatically included in the extraction?
Yes, provided that the common service principal has the necessary permissions on the newly added workspace.
What are the current limitations of cross-workspace extraction?
Cross-workspace extraction has the following limitations:
- Preflight check limitations: The validation capabilities are limited in cross-workspace scenarios:
- System Tables check is limited to validating access to the
system.access.workspace_latesttable only; cross-workspace validation isn't performed - Tags check is restricted to the current configured workspace only
- Schema check is performed across available cross-workspaces and the configured workspace
- System Tables check is limited to validating access to the
- Reverse tag sync limitations: There are restrictions on bidirectional synchronization:
- Reverse sync is limited to the configured Databricks workspace and isn't supported for cross-workspaces