Connect BigQuery to Lakehouse
This guide walks you through how to connect BigQuery to your Lakehouse using external Iceberg tables, so you can start running BigQuery queries on your Atlan metadata.
Prerequisites
Before you begin, make sure that:
-
You have enabled Lakehouse for your Atlan tenant. See Enable Lakehouse.
-
You have your catalog credentials: GCS region, Catalog URI, Catalog Name, Warehouse Name, and OAuth credentials (Client ID and Client Secret). You can find these in the Atlan UI under Workflows > Marketplace > Atlan Lakehouse > Connection Details. If you don't see them, contact Atlan Support.
-
All four resources are in the same region: the GCS bucket, BigQuery connection, BigQuery dataset, and query execution location. Region mismatches cause location-related errors.
-
You have installed the required Python dependencies:
pip install "pyiceberg[pyarrow]" google-cloud-bigquery
Set up external tables in BigQuery
-
In your GCP project, create a Cloud Resource connection in the same region as the GCS data provided by Atlan.
- BigQuery UI
- CLI
- In the BigQuery console, navigate to Explorer > your project > + Add data.
- Search for cloud resource, then select Vertex AI > BigQuery federation.
- Enter a connection name (for example,
atlan-mdlh-conn) and select the region provided by Atlan. - On the connection info page, note the Service Account ID.
Run the following command to create the connection:
bq mk --connection \
--project_id=<PROJECT_ID> \
--location=<REGION> \
--connection_type=CLOUD_RESOURCE \
atlan-mdlh-connThen retrieve the Service Account ID:
bq show --connection --location=<REGION> <CONNECTION_ID> -
Send the Service Account ID to Atlan Support. Atlan uses it to grant read access to the GCS bucket containing the Lakehouse data. You're notified when access is enabled—continue with Configure and run script once confirmed.
Configure and run script
Once Atlan has confirmed that data sharing is enabled:
-
Download
bq_external_iceberg_tables_create_refresh.pyfrom the Lakehouse Solutions repository. Set the following values as environment variables or edit them directly in the script:BQ_PROJECT_ID: GCP project ID where the connection was createdBQ_LOCATION: GCS/BigQuery region (must match across all resources)BQ_CONNECTION_ID: Connection name created in the previous sectionPOLARIS_CATALOG_URI: Catalog URI provided by AtlanCLIENT_ID: OAuth Client ID provided by AtlanCLIENT_SECRET: OAuth Client Secret provided by AtlanENABLE_HISTORY_NAMESPACE_SYNC: Set totrueto include theatlan-historynamespace (default:false)
-
In your terminal, run the script:
python bq_external_iceberg_tables_create_refresh.pyThe script autodetects the Polaris warehouse, discovers all namespaces and tables, and creates BigQuery datasets per namespace.
Example: To verify the setup and query metadata for assets registered in Atlan:
SELECT *
FROM `<PROJECT_ID>.gold.assets`
LIMIT 10;Dataset names follow BigQuery naming rules, hyphens are converted to underscores (for example,
atlan-nsbecomesatlan_ns). External Iceberg tables are created or replaced in each dataset usingCREATE OR REPLACE EXTERNAL TABLE, so the script is safe to re-run for both initial setup and ongoing refresh.
Refresh external tables
The external tables don't sync automatically. Re-run the script periodically to keep them up to date with the latest Lakehouse data.
-
In your terminal, run the script:
python bq_external_iceberg_tables_create_refresh.py -
Schedule the script to run on a recurring basis:
- Maximum frequency: No more than once every 30 minutes
Troubleshooting
If you have any issues configuring or querying external Iceberg tables in BigQuery, see Troubleshooting BigQuery errors.
Next steps
Now that BigQuery is connected to Lakehouse, you can:
- Query Atlan metadata from BigQuery: See the available metadata tables in Entity metadata reference.
- Use cases: Explore popular patterns such as metadata enrichment tracking, lineage impact analysis, and glossary alignment in Use cases.