Crawl Google BigQuery

Extract metadata assets from your Google BigQuery data warehouse into Atlan.

Prerequisites

Before you begin, verify you have:

Completed the Set up Google BigQuery guide or Set up Workload Identity Federation
Access to your Google Cloud project and BigQuery
Reviewed the order of operations

Create crawler workflow

Create a new workflow and select Google BigQuery as your connector source.

In the top-right corner of any screen, select New > New Workflow.
From the list of packages, select BigQuery Assets > Setup Workflow.

Configure extraction

When setting up metadata extraction from your Google BigQuery data, choose how Atlan connects and extracts metadata. Select the extraction method that best fits your organization's security and network requirements:

Direct
Agent

Atlan SaaS connects directly to Google BigQuery (typically via the public BigQuery API or Private Service Connect). This method supports multiple authentication options and lets you test the connection before proceeding.

For Connectivity, choose how you want Atlan to connect to Google BigQuery:
- Public Network: Connect using the public BigQuery API endpoint from Google.
- Private Network Link: Connect through a private endpoint. Contact Atlan support to request the DNS name of the Private Service Connect endpoint. For Host, enter the DNS name in the format https://bigquery-<privateserver>.p.googleapis.com. Replace <privateserver> with the DNS name. For Port, 443 is the default.
Choose an authentication method for your direct connection.

Service account
Workload Identity Federation

Use a service account key for authentication.
- Project Id: Enter the value of project_id from the JSON for the service account you created. This project ID is used to authenticate the connection. You can configure the crawler to extract more than the specified project.
- Service Account Json: Paste the entire JSON for the service account you created.
- Service Account Email: Enter the value of client_email from the JSON for the service account you created.
After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed with the connection configuration.

Authenticate using Workload Identity Federation.
- Project Id: Enter your Google Cloud project ID. This project ID is used to authenticate the connection. You can configure the crawler to extract more than the specified project.
- Service Account Email: Enter the email of the service account that has BigQuery permissions and is configured for WIF impersonation.
- WIF Pool Provider Id: Enter the full resource name of your WIF provider in the following format:
```
//iam.googleapis.com/projects/<project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>
```
- Atlan OAuth Client Id: Enter the OAuth Client ID you created in Atlan during WIF setup.
- Atlan OAuth Client Secret: Enter the OAuth Client Secret you created in Atlan.
After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed with the connection configuration.

Atlan's Secure Agent application is deployed within your organization and connects to Google BigQuery. This method provides additional security by keeping connections within your network perimeter.

Install Self-Deployed Runtime if you haven't already:
- Install via Docker Compose
- Install on Kubernetes
For Connectivity, choose how you want Atlan to connect to Google BigQuery:
- Public Network: Connect using the public BigQuery API endpoint from Google.
- Private Network Link: Connect through a private endpoint. Contact Atlan support to request the DNS name of the Private Service Connect endpoint. For Host, enter the DNS name in the format https://bigquery-<privateserver>.p.googleapis.com. Replace <privateserver> with the DNS name. For Port, 443 is the default.
Choose an authentication method for your agent-based connection.

Service account
Workload Identity Federation

Use a service account key for authentication.
- Project Id: Enter the value of project_id from the JSON for the service account you created.
- Service Account Json: Paste the entire JSON for the service account you created.
- Service Account Email: Enter the value of client_email from the JSON for the service account you created.
Click Next to proceed with the connection configuration.

Authenticate using Workload Identity Federation from within your agent environment.
- Project Id: Enter your Google Cloud project ID.
- Service Account Email: Enter the email of the service account that has BigQuery permissions and is configured for WIF impersonation.
- WIF Pool Provider Id: Enter the full resource name of your WIF provider in the following format:
```
//iam.googleapis.com/projects/<project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>
```
- Atlan OAuth Client Id: Enter the OAuth Client ID you created in Atlan during WIF setup.
- Atlan OAuth Client Secret: Enter the OAuth Client Secret you created in Atlan.
Click Next to proceed with the connection configuration.

Configure connection

Set up the connection name and access controls for your Google BigQuery data source in Atlan.

Provide a Connection Name that represents your source environment. For example, you might use values like production, development, gold, or analytics.
To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection (not even admins).
To prevent users from querying Google BigQuery data, set Allow SQL Query to No.
To prevent users from previewing Google BigQuery data, set Allow Data Preview to No.
At the bottom of the screen, click Next to proceed.

Configure crawler

Before running the crawler, you can configure which assets to include or exclude and other crawler options. Include and exclude metadata filters are only available when using the direct extraction method. If an asset appears in both the include and exclude filters, the exclude filter takes precedence.

For Filter Sharded Tables, keep No for the default configuration or click Yes to enable Atlan to catalog and display sharded tables with the same naming prefix as a single table in asset discovery and the lineage graph.
To exclude specific assets from crawling, select Exclude Metadata. This defaults to no assets if none are specified.
To include specific assets in crawling, select Include Metadata. This defaults to all assets if none are specified.
To ignore tables and views based on a naming convention, specify a regular expression in the Exclude regex for tables & views field.
To import existing tags from Google BigQuery to Atlan, for Import Tags, click Yes.
For Advanced Config, keep Default for the default configuration or click Custom if Atlan support has provided you with a custom control configuration:
- Enter the configuration into the Custom Config box. You can also enter {"ignore-all-case": true} to enable crawling assets with case-sensitive identifiers.
- For Hidden Assets, keep No for the default configuration or click Yes to crawl metadata from your hidden datasets in Google BigQuery.

Run crawler

Direct
Agent

Click Preflight checks to validate permissions and configuration before running the crawler. This helps identify any potential issues early.
After the preflight checks pass, you can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.

Once the crawler has completed running, you can see the assets in Atlan's asset page.

Prerequisites​

Create crawler workflow​

Configure extraction​

Configure connection​

Configure crawler​

Run crawler​

See also​