Connect Databricks to Lakehouse
This guide walks you through how to connect Databricks to your Lakehouse using foreign Iceberg tables in Unity Catalog, so you can start running Databricks queries on your Atlan metadata.
Prerequisites
Before you begin, make sure that:
-
You have enabled Lakehouse for your Atlan tenant. See Enable Lakehouse.
-
You have your catalog credentials: Catalog URI, Catalog Name, Warehouse Name, and OAuth credentials (Client ID and Client Secret). You can find these in the Atlan UI under Workflows > Marketplace > Atlan Lakehouse > Connection Details. If you don't see them, contact Atlan Support.
-
You have permission in your Databricks workspace to create storage credentials, external locations, and Unity Catalogs.
-
The required Python dependency is installed automatically by the first cell of each notebook:
%pip install pyiceberg
Enable Databricks private preview
The foreign Iceberg tables workaround requires a Databricks Private Preview feature that must be enabled on your workspace before proceeding.
-
Contact your Databricks account representative and request enablement of the Foreign Iceberg Tables Private Preview feature on your workspace.
-
Once your Databricks representative confirms the feature is enabled, notify Atlan Support. Atlan uses this confirmation to prepare your storage access details and credentials. You'll be notified when ready—continue with Set up Unity Catalog access once Atlan confirms.
Set up Unity Catalog access
Once Atlan provides your storage access details and credentials, follow the setup steps for your storage backend.
- AWS (S3)
- Azure (ADLS)
Atlan provides the following: IAM Role ARN, Amazon S3 bucket path, and OAuth credentials.
-
In your Databricks workspace, create a storage credential using the IAM Role ARN provided by Atlan:
- Navigate to Catalog Explorer > Credentials > Create Credential.
- Select AWS IAM Role as the credential type.
- Enter the IAM Role ARN provided by Atlan.
- In Advanced Options, enable Limit to read-only use.
-
Send the IAM Role ARN and External ID of the storage credential to Atlan Support. Atlan uses these to grant your credential read access to the S3 bucket. You'll be notified when access is granted before proceeding to step 3.
-
Once Atlan confirms access, create an external location in Unity Catalog pointing to the S3 path provided by Atlan:
- Navigate to Catalog Explorer > External Locations > Create External Location.
- Select Manual and choose Amazon S3 as the storage type.
- Enter the Amazon S3 path provided by Atlan.
- Select the credential created in step 1.
- Enable Read-only mode in advanced options, then click Create.
-
Create a target catalog where the foreign Iceberg tables are registered. This is the catalog name you configure as
DBX_CATALOG_NAMEin the scripts:- Navigate to Catalog > Create a Catalog.
- Enter a catalog name and select Standard type.
- For storage location, use a customer-managed storage location (not Atlan-managed).
Atlan provides the following: Service Principal credentials (Directory/Tenant ID, Application/Client ID, Client Secret), Storage Account name, and OAuth credentials.
-
Create a storage credential in Unity Catalog using the Service Principal credentials provided by Atlan:
- Databricks UI
- CLI
- Navigate to Catalog Explorer > Credentials > Create Credential.
- Select Azure Service Principal as the credential type.
- Enter the Directory (Tenant) ID, Application (Client) ID, and Client Secret provided by Atlan.
databricks storage-credentials create --json '{
"name": "<credential-name>",
"azure_service_principal": {
"directory_id": "<DIRECTORY_ID>",
"application_id": "<APPLICATION_ID>",
"client_secret": "<CLIENT_SECRET>"
}
}' -
Create an external location in Unity Catalog pointing to the ADLS path provided by Atlan:
- Navigate to Catalog Explorer > External Locations > Create External Location.
- Select Manual and choose Azure Data Lake Storage as the storage type.
- Enter the ADLS path in the following format:
abfss://objectstore@<storage-account-name>.dfs.core.windows.net/atlan-wh/ - Select the credential created in step 1.
- In Advanced Options, enable Limit to read-only use.
- Click Test Connection to validate, then click Create.
-
Create a target catalog where the foreign Iceberg tables are registered. This is the catalog name you configure as
DBX_CATALOG_NAMEin the scripts:- Navigate to Catalog > Create a Catalog.
- Enter a catalog name and select Standard type.
- Make sure the catalog storage is hosted on your own Azure tenant, not Atlan's.
Create foreign Iceberg tables
Once Unity Catalog access is set up:
-
Download the create script
dbx_foreign_iceberg_tables_create.pyfrom the Lakehouse Solutions repository and import it as a Databricks notebook. Set the following values in the Configuration cell:CLIENT_ID: OAuth Client ID provided by AtlanCLIENT_SECRET: OAuth Client Secret provided by AtlanPOLARIS_CATALOG_URI: Catalog URI provided by Atlan (for example,https://<tenant>.atlan.com/api/polaris/api/catalog)CATALOG_NAME: Polaris catalog name provided by AtlanWAREHOUSE_NAME: Polaris warehouse name provided by AtlanDBX_CATALOG_NAME: Target Unity Catalog name created in the previous sectionHISTORY_NAMESPACE_SYNC: Set totrueto include theatlan-historynamespace (default:false)
-
Run the notebook. The script autodetects the Polaris warehouse, discovers all namespaces and tables, and creates schemas and foreign Iceberg tables in the target Unity Catalog. The script uses
CREATE TABLE IF NOT EXISTS, making it safe to re-run.Example: To verify the setup and query metadata for assets registered in Atlan:
SELECT *
FROM <DBX_CATALOG_NAME>.gold.assets
LIMIT 10;
Refresh foreign Iceberg tables
Foreign Iceberg tables don't sync automatically. Run the refresh script periodically to keep tables up to date with the latest Lakehouse data.
-
Download the refresh script
dbx_foreign_iceberg_tables_refresh.pyfrom the Lakehouse Solutions repository and import it as a Databricks notebook. Configure the same variables as the create script. -
Schedule the notebook to run on a recurring basis:
- Maximum frequency: No more than once every 30 minutes
The refresh script uses
REFRESH TABLEto update metadata pointers without recreating tables.
Troubleshooting
If you have any issues configuring or querying foreign Iceberg tables in Databricks, see Troubleshooting Databricks errors.
Next steps
Now that Databricks is connected to Lakehouse, you can:
- Query Atlan metadata from Databricks: See the available metadata tables in Entity metadata reference.
- Use cases: Explore popular patterns such as metadata enrichment tracking, lineage impact analysis, and glossary alignment in Use cases.