Connect BigQuery to Lakehouse

Connect docs via MCP

BigQuery accesses Atlan's Lakehouse through external Iceberg tables you create using a provided Python script from the Lakehouse Solutions repository. The script discovers your Lakehouse namespaces and maps them to BigQuery datasets. Setup requires a BigQuery connection in GCP and a one-time storage access grant from Atlan Support. The connection type and region naming depend on whether your Atlan tenant's Lakehouse is backed by GCS or by S3:

GCS-backed tenants use a Cloud Resource connection and a standard region name (for example, us-east-1).
S3-backed tenants use a BigLake on AWS connection (BigQuery Omni) and an AWS-prefixed location (for example, aws-us-east-1).

If you don't know which type your tenant uses, contact Atlan Support.

Prerequisites

Before you begin, make sure that:

You have installed the required Python dependencies:

pip install "pyiceberg[pyarrow]" google-cloud-bigquery

All four resources are in the same region: the storage bucket, BigQuery connection, BigQuery dataset, and query execution location. Region mismatches cause location-related errors. For S3-backed tenants, the BigQuery connection, dataset, and query execution location must use the AWS-prefixed form (for example, aws-us-east-1).

Set up external tables in BigQuery

Before you begin, in your Atlan workspace, navigate to Workflow > Marketplace > Atlan Lakehouse > View connection details and note your catalog URI, catalog name, storage region, OAuth client ID, and OAuth client secret. If you don't see them, contact Atlan Support.

GCP (GCS)
AWS (S3)

For GCS-backed Lakehouse tenants.

In your GCP project, create a Cloud Resource connection in the same region as the GCS data provided by Atlan.
- BigQuery UI
- CLI
- In the BigQuery console, navigate to Explorer > your project > + Add data.
- Search for cloud resource, then select Vertex AI > BigQuery federation.
- Enter a connection name (for example, atlan-mdlh-conn) and select the region provided by Atlan.
- On the connection info page, note the Service Account ID.
Run the following command to create the connection:
bq mk --connection \ --project_id=<PROJECT_ID> \ --location=<REGION> \ --connection_type=CLOUD_RESOURCE \ atlan-mdlh-conn
Then retrieve the Service Account ID:
bq show --connection --location=<REGION> <CONNECTION_ID>
Send the Service Account ID to Atlan Support. Atlan grants the service account the Storage Object Viewer role (roles/storage.objectViewer) on the Lakehouse GCS bucket. This role includes storage.objects.list—required by BigQuery for file pattern expansion on external tables—and storage.objects.get for reading the data. You'll receive a confirmation email once access is enabled—continue with Configure and run script at that point.

note
If the role hasn't been granted yet, BigQuery queries on Lakehouse tables can fail with: Permission 'storage.objects.list' denied on resource (or it may not exist).

To set up multiple connections for different environments (for example, separate BQ_CONNECTION_ID values for dev and prod), include all service account IDs in a single support ticket. All connections access the same Lakehouse snapshot with no isolation between environments.

For S3-backed Lakehouse tenants.

Before proceeding, you must have received the IAM Role ARN and AWS region from Atlan. If you haven't received them, open a ticket with Atlan Support, specify your Atlan tenant URL (for example, yourcompany.atlan.com), and request BigQuery on Lakehouse access for an AWS-backed tenant.

In your GCP project, enable the BigQuery Connection API and the BigLake API. Both are required for BigLake on AWS (BigQuery Omni) connections.
Create a BigLake on AWS connection in the same aws-<region> location as the S3 data provided by Atlan, using the IAM Role ARN provided by Atlan.
- BigQuery UI
- CLI
- In the BigQuery console, navigate to Explorer > your project > + Add data.
- Search for Amazon S3, then select BigQuery Omni federation.
- Enter a connection name (for example, atlan-mdlh-conn).
- For Connection location type, select AWS and choose the location provided by Atlan (for example, aws-us-east-1).
- Paste the IAM Role ARN provided by Atlan.
- On the connection info page, note the BigQuery Google identity (a numeric subject value).
Run the following command to create the connection. Use the AWS-prefixed location (for example, aws-us-east-1) provided by Atlan:
bq mk --connection \ --project_id=<PROJECT_ID> \ --location=aws-<REGION> \ --connection_type=AWS \ --properties='{"crossAccountRole": "<ATLAN_IAM_ROLE_ARN>"}' \ atlan-mdlh-conn
Then retrieve the Google identity of the connection:
bq show --connection --location=aws-<REGION> <PROJECT_ID>.aws-<REGION>.<CONNECTION_ID>
Look for the identity field in the output—it's a numeric subject value.
Send the BigQuery Google identity value, along with your Atlan tenant URL, to Atlan Support. Atlan adds this identity to the IAM role's trust policy, allowing BigQuery to assume the role and read the Lakehouse S3 bucket. You'll receive a confirmation email once access is enabled—continue with Configure and run script at that point.

note
If the trust policy hasn't been updated yet, BigQuery queries on Lakehouse tables can fail with: Connection ... isn't authorized to assume into your AWS IAM Role.

To set up multiple connections for different environments (for example, separate BQ_CONNECTION_ID values for dev and prod), include all Google identity values in a single support ticket. All connections access the same Lakehouse snapshot with no isolation between environments.

Configure and run script

Once Atlan has confirmed that data sharing is enabled:

Download bq_external_iceberg_tables_create_refresh.py from the Lakehouse Solutions repository. Set the following values as environment variables or edit them directly in the script:
- BQ_PROJECT_ID: GCP project ID where the connection was created
- BQ_LOCATION: Storage/BigQuery region (must match across all resources). For S3-backed tenants, use the AWS-prefixed form (for example, aws-us-east-1).
- BQ_CONNECTION_ID: Connection name created in the previous section
- POLARIS_CATALOG_URI: Catalog URI provided by Atlan
- CLIENT_ID: OAuth Client ID provided by Atlan
- CLIENT_SECRET: OAuth Client Secret provided by Atlan
- ENABLE_HISTORY_NAMESPACE_SYNC: Set to true to include the atlan-history namespace (default: false)
In your terminal, run the script:
```
python bq_external_iceberg_tables_create_refresh.py
```
The script autodetects the Polaris warehouse, discovers all namespaces and tables, and creates BigQuery datasets per namespace.

Example: To verify the setup and query metadata for assets registered in Atlan:
```
SELECT *
FROM `<PROJECT_ID>.gold.assets`
LIMIT 10;
```
Dataset names follow BigQuery naming rules, hyphens are converted to underscores (for example, atlan-ns becomes atlan_ns). External Iceberg tables are created or replaced in each dataset using CREATE OR REPLACE EXTERNAL TABLE, so the script is safe to re-run for both initial setup and ongoing refresh.

Refresh external tables

The external tables don't sync automatically. Re-run the script periodically to keep them up to date with the latest Lakehouse data.

In your terminal, run the script:

python bq_external_iceberg_tables_create_refresh.py

Schedule the script to run on a recurring basis:
- Maximum frequency: No more than once every 30 minutes

Troubleshooting

If you have any issues configuring or querying external Iceberg tables in BigQuery, see Troubleshooting BigQuery errors.

Next steps

Now that BigQuery is connected to Lakehouse, you can:

Query Atlan metadata from BigQuery: See the available metadata tables in Entity metadata reference.
Use cases: Explore popular patterns such as metadata enrichment tracking, lineage impact analysis, and glossary alignment in Use cases.

Prerequisites​

Set up external tables in BigQuery​

Configure and run script​

Refresh external tables​

Troubleshooting​

Next steps​