Skip to main content

Connect BigQuery to Lakehouse

This guide walks you through how to connect BigQuery to your Lakehouse using external Iceberg tables, so you can start running BigQuery queries on your Atlan metadata.

Prerequisites

Before you begin, make sure that:

  • You have enabled Lakehouse for your Atlan tenant. See Enable Lakehouse.

  • You have your catalog credentials: GCS region, Catalog URI, Catalog Name, Warehouse Name, and OAuth credentials (Client ID and Client Secret). You can find these in the Atlan UI under Workflows > Marketplace > Atlan Lakehouse > Connection Details. If you don't see them, contact Atlan Support.

  • All four resources are in the same region: the GCS bucket, BigQuery connection, BigQuery dataset, and query execution location. Region mismatches cause location-related errors.

  • You have installed the required Python dependencies:

    pip install "pyiceberg[pyarrow]" google-cloud-bigquery

Set up external tables in BigQuery

  1. In your GCP project, create a Cloud Resource connection in the same region as the GCS data provided by Atlan.

    • In the BigQuery console, navigate to Explorer > your project > + Add data.
    • Search for cloud resource, then select Vertex AI > BigQuery federation.
    • Enter a connection name (for example, atlan-mdlh-conn) and select the region provided by Atlan.
    • On the connection info page, note the Service Account ID.
  2. Send the Service Account ID to Atlan Support. Atlan uses it to grant read access to the GCS bucket containing the Lakehouse data. You're notified when access is enabled—continue with Configure and run script once confirmed.

Configure and run script

Once Atlan has confirmed that data sharing is enabled:

  1. Download bq_external_iceberg_tables_create_refresh.py from the Lakehouse Solutions repository. Set the following values as environment variables or edit them directly in the script:

    • BQ_PROJECT_ID: GCP project ID where the connection was created
    • BQ_LOCATION: GCS/BigQuery region (must match across all resources)
    • BQ_CONNECTION_ID: Connection name created in the previous section
    • POLARIS_CATALOG_URI: Catalog URI provided by Atlan
    • CLIENT_ID: OAuth Client ID provided by Atlan
    • CLIENT_SECRET: OAuth Client Secret provided by Atlan
    • ENABLE_HISTORY_NAMESPACE_SYNC: Set to true to include the atlan-history namespace (default: false)
  2. In your terminal, run the script:

    python bq_external_iceberg_tables_create_refresh.py

    The script autodetects the Polaris warehouse, discovers all namespaces and tables, and creates BigQuery datasets per namespace.

    Example: To verify the setup and query metadata for assets registered in Atlan:

    SELECT *
    FROM `<PROJECT_ID>.gold.assets`
    LIMIT 10;

    Dataset names follow BigQuery naming rules, hyphens are converted to underscores (for example, atlan-ns becomes atlan_ns). External Iceberg tables are created or replaced in each dataset using CREATE OR REPLACE EXTERNAL TABLE, so the script is safe to re-run for both initial setup and ongoing refresh.

Refresh external tables

The external tables don't sync automatically. Re-run the script periodically to keep them up to date with the latest Lakehouse data.

  1. In your terminal, run the script:

    python bq_external_iceberg_tables_create_refresh.py
  2. Schedule the script to run on a recurring basis:

    • Maximum frequency: No more than once every 30 minutes

Troubleshooting

If you have any issues configuring or querying external Iceberg tables in BigQuery, see Troubleshooting BigQuery errors.

Next steps

Now that BigQuery is connected to Lakehouse, you can:

  • Query Atlan metadata from BigQuery: See the available metadata tables in Entity metadata reference.
  • Use cases: Explore popular patterns such as metadata enrichment tracking, lineage impact analysis, and glossary alignment in Use cases.