Skip to main content

Connect Databricks to Lakehouse

This guide walks you through how to connect Databricks to your Lakehouse using foreign Iceberg tables in Unity Catalog, so you can start running Databricks queries on your Atlan metadata.

Prerequisites

Before you begin, make sure that:

  • You have enabled Lakehouse for your Atlan tenant. See Enable Lakehouse.

  • You have your catalog credentials: Catalog URI, Catalog Name, Warehouse Name, and OAuth credentials (Client ID and Client Secret). You can find these in the Atlan UI under Workflows > Marketplace > Atlan Lakehouse > Connection Details. If you don't see them, contact Atlan Support.

  • You have permission in your Databricks workspace to create storage credentials, external locations, and Unity Catalogs.

  • The required Python dependency is installed automatically by the first cell of each notebook:

    %pip install pyiceberg

Enable Databricks private preview

The foreign Iceberg tables workaround requires a Databricks Private Preview feature that must be enabled on your workspace before proceeding.

  1. Contact your Databricks account representative and request enablement of the Foreign Iceberg Tables Private Preview feature on your workspace.

  2. Once your Databricks representative confirms the feature is enabled, notify Atlan Support. Atlan uses this confirmation to prepare your storage access details and credentials. You'll be notified when ready—continue with Set up Unity Catalog access once Atlan confirms.

Set up Unity Catalog access

Once Atlan provides your storage access details and credentials, follow the setup steps for your storage backend.

Atlan provides the following: IAM Role ARN, Amazon S3 bucket path, and OAuth credentials.

  1. In your Databricks workspace, create a storage credential using the IAM Role ARN provided by Atlan:

    • Navigate to Catalog Explorer > Credentials > Create Credential.
    • Select AWS IAM Role as the credential type.
    • Enter the IAM Role ARN provided by Atlan.
    • In Advanced Options, enable Limit to read-only use.
  2. Send the IAM Role ARN and External ID of the storage credential to Atlan Support. Atlan uses these to grant your credential read access to the S3 bucket. You'll be notified when access is granted before proceeding to step 3.

  3. Once Atlan confirms access, create an external location in Unity Catalog pointing to the S3 path provided by Atlan:

    • Navigate to Catalog Explorer > External Locations > Create External Location.
    • Select Manual and choose Amazon S3 as the storage type.
    • Enter the Amazon S3 path provided by Atlan.
    • Select the credential created in step 1.
    • Enable Read-only mode in advanced options, then click Create.
  4. Create a target catalog where the foreign Iceberg tables are registered. This is the catalog name you configure as DBX_CATALOG_NAME in the scripts:

    • Navigate to Catalog > Create a Catalog.
    • Enter a catalog name and select Standard type.
    • For storage location, use a customer-managed storage location (not Atlan-managed).

Create foreign Iceberg tables

Once Unity Catalog access is set up:

  1. Download the create script dbx_foreign_iceberg_tables_create.py from the Lakehouse Solutions repository and import it as a Databricks notebook. Set the following values in the Configuration cell:

    • CLIENT_ID: OAuth Client ID provided by Atlan
    • CLIENT_SECRET: OAuth Client Secret provided by Atlan
    • POLARIS_CATALOG_URI: Catalog URI provided by Atlan (for example, https://<tenant>.atlan.com/api/polaris/api/catalog)
    • CATALOG_NAME: Polaris catalog name provided by Atlan
    • WAREHOUSE_NAME: Polaris warehouse name provided by Atlan
    • DBX_CATALOG_NAME: Target Unity Catalog name created in the previous section
    • HISTORY_NAMESPACE_SYNC: Set to true to include the atlan-history namespace (default: false)
  2. Run the notebook. The script autodetects the Polaris warehouse, discovers all namespaces and tables, and creates schemas and foreign Iceberg tables in the target Unity Catalog. The script uses CREATE TABLE IF NOT EXISTS, making it safe to re-run.

    Example: To verify the setup and query metadata for assets registered in Atlan:

    SELECT *
    FROM <DBX_CATALOG_NAME>.gold.assets
    LIMIT 10;

Refresh foreign Iceberg tables

Foreign Iceberg tables don't sync automatically. Run the refresh script periodically to keep tables up to date with the latest Lakehouse data.

  1. Download the refresh script dbx_foreign_iceberg_tables_refresh.py from the Lakehouse Solutions repository and import it as a Databricks notebook. Configure the same variables as the create script.

  2. Schedule the notebook to run on a recurring basis:

    • Maximum frequency: No more than once every 30 minutes

    The refresh script uses REFRESH TABLE to update metadata pointers without recreating tables.

Troubleshooting

If you have any issues configuring or querying foreign Iceberg tables in Databricks, see Troubleshooting Databricks errors.

Next steps

Now that Databricks is connected to Lakehouse, you can:

  • Query Atlan metadata from Databricks: See the available metadata tables in Entity metadata reference.
  • Use cases: Explore popular patterns such as metadata enrichment tracking, lineage impact analysis, and glossary alignment in Use cases.