Databricks
Overview: Catalog Databricks workspaces, databases, schemas, and tables in Atlan. Gain visibility into lineage, usage, and governance for your Databricks assets.
Get started
Start by setting up the Databricks connector in Atlan. This involves configuring authentication and connection settings so Atlan can access your Databricks workspace.
- Set up the connector: Configure the Databricks connector with authentication credentials and connection settings.
- Additional configurations needed for your environment:
- Enable SSO for Databricks: Set up SSO authentication if your organization uses single sign-on for Databricks access.
- Set up cross-workspace extraction: Configure a single service principal to crawl metadata from all workspaces within a Databricks metastore, useful when you have multiple workspaces sharing the same metastore.
- Additional configurations needed for your environment:
Crawl assets
After setting up the connector, crawl your Databricks assets to discover and catalog them in Atlan:
- Crawl Databricks assets: Discover and catalog your Databricks assets in Atlan.
Advanced setup
Use these guides for specialized deployment scenarios or additional configuration options. These guides help you when your Databricks environment requires specific network configurations, runs on-premises, when you need to extract lineage and usage metrics, or when you need to manage tags.
Lineage and usage
Extract lineage and usage metrics to understand how data flows through your Databricks assets and which assets are most frequently accessed:
- Extract lineage and usage from Databricks: Extract lineage and usage metrics from your Databricks assets.
On-premises
- Set up on-premises Databricks access: Configure Atlan to access on-premises Databricks environments.
- Crawl on-premises Databricks: Crawl metadata from on-premises Databricks environments.
- Set up on-premises Databricks lineage extraction: Prepare for offline lineage extraction from on-premises Databricks.
- Extract on-premises Databricks lineage: Step-by-step instructions for extracting lineage from on-premises Databricks.
Private network
Set up private network connections to Databricks when you need secure, private connectivity without exposing your Databricks workspace to the public internet:
- Set up an AWS private network link to Databricks: Establish a secure, private network connection to Databricks on AWS.
- Set up an Azure private network link to Databricks: Establish a secure, private network connection to Databricks on Azure.
Tag management
Configure and manage tags in Databricks to enhance metadata governance:
- Manage Databricks tags: Configure and manage tags in Databricks.
Concepts
- Lineage filtering approaches: Understand how Atlan filters Databricks lineage data to show only valid lineage relationships.
References
- What does Atlan crawl from Databricks: Learn about the Databricks assets and metadata that Atlan discovers and catalogs.
- Preflight checks for Databricks: Verify prerequisites before setting up the Databricks connector.
Troubleshooting
- Databricks connectivity: Resolve common Databricks connection issues and errors.
- Cross-workspace extraction issues: Troubleshoot common issues in Databricks cross-workspace extraction with error, cause, and solution guidance.
FAQ
- Cross-workspace extraction setup: Frequently asked questions about setting up and configuring cross-workspace extraction.
- Atlan vs Databricks lineage: Frequently asked questions about how Atlan lineage differs from Databricks native lineage.