Mine Google BigQuery
Once you have crawled assets from Google BigQuery, you can mine its query history to construct lineage. The miner supports both Direct and Agent extraction methods.
To mine lineage from Google BigQuery, review the order of operations and then complete the following steps.
Select miner
To select the Google BigQuery miner:
- In the top navigation, click Marketplace.
- Search for BigQuery Miner and select it.
- Click Install.
- Once installation completes, click Setup Workflow on the same tile.
If you navigated away before installation completed, go to New > New Workflow and select BigQuery Miner to proceed.
Configure miner
To configure the Google BigQuery miner:
-
For Connection, select the connection to mine. (To select a connection, the crawler must have already run.)
-
For Miner Extraction Method, select Query History, Offline, or Agent.
-
For Start time, choose the earliest date from which to mine query history.
info💪 Did you know? The miner restricts you to only querying the past two weeks of query history. If you need to query more history, for example in an initial load, consider using the S3 miner first. After the initial load, you can modify the miner's configuration to use query history extraction.
-
(Optional) By default, the miner fetches data from the default region (United States). To fetch data from another region, for Region, select Custom and then enter the region where your
INFORMATION_SCHEMAis hosted under Custom BigQuery Region. Enter the region in the following formatregion-<REGION>, replacing<REGION>with your specific region - for example,europe-north1. -
To check for any permissions or other configuration issues before running the miner, click Preflight checks.
-
At the bottom of the screen, click Next to proceed.
If running the miner for the first time, Atlan recommends setting a start date roughly three days prior to the current date and then scheduling it daily to build up to two weeks of query history. Mining two weeks of query history on the first miner run may cause delays. Atlan requires a minimum lag of 24 to 48 hours to capture all the relevant transformations that were part of a session. Learn more about the miner logic here.
Configure agent extraction
If your organization requires connections to remain within your network perimeter, use the Agent extraction method instead of Direct. To use a Secure Agent, follow these steps:
- Select the Agent tab.
- Install Self-Deployed Runtime if you haven't already:
- For Connectivity, choose how you want Atlan to connect to Google BigQuery:
- Public Network: Connect using the public BigQuery API endpoint from Google.
- Private Network Link: Connect through a private endpoint. Contact Atlan support to request the DNS name of the Private Service Connect endpoint. For Host, enter the DNS name in the format
https://bigquery-<privateserver>.p.googleapis.com. Replace<privateserver>with the DNS name. For Port,443is the default.
- Choose an authentication method for your agent-based connection and configure the data source by adding the secret keys for your secret store:
- Service account
- Workload Identity Federation
- Project Id: Enter the secret key name for
project_id. - Secret Key for Service Account JSON: Enter the secret key name for the service account JSON. For format requirements, see Configure workflow execution.
- Secret Key for Service Account Email: Enter the secret key name for
client_email.
- Project Id: Enter the secret key name for your Google Cloud project ID.
- Secret Key for Service Account Email: Enter the secret key name for the service account email (used for WIF impersonation).
- Secret Key for WIF Pool Provider Id: Enter the secret key name for the WIF provider resource name.
- Secret Key for Atlan OAuth Client Id: Enter the secret key name for the OAuth Client ID created during WIF setup.
- Secret Key for Atlan OAuth Client Secret: Enter the secret key name for the OAuth Client Secret. For format requirements, see Configure workflow execution.
- Complete the Secure Agent configuration by selecting your secret store and entering the secret path. For details, see Configure workflow execution.
- Click Next after completing the configuration.
Configure miner behavior
To configure the Google BigQuery miner behavior:
- (Optional) For Calculate popularity, change to True to retrieve usage and popularity metrics for your Google BigQuery assets from query history:
- To select a pricing model for running queries, for Pricing Model, click On Demand to be charged for the number of bytes processed or Flat Rate for the number of slots purchased.
- For Popularity Window (days), 30 days is the maximum limit. You can set a shorter popularity window of less than 30 days.
- For Excluded Users, type the names of users to be excluded while calculating usage metrics for Google BigQuery assets. Press
enterafter each name to add more names.
- (Optional) For Control Config, click Custom to configure the following:
- For Fetch excluded project's QUERY_HISTORY, click Yes to mine query history from databases or projects excluded while crawling metadata from Google BigQuery.
- If Atlan support has provided you with a custom control configuration, enter the configuration into the Custom Config box. You can also:
- (Optional) Enter
{“ignore-all-case”: true}to enable crawling assets with case-sensitive identifiers.
- (Optional) Enter
Run miner
To run the Google BigQuery miner, after completing the previous steps:
- To run the miner once immediately, at the bottom of the screen, click the Run button.
- To schedule the miner to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule & Run button.
Once the miner has completed running, you can see lineage for Google BigQuery assets that were created in Google BigQuery between the start time and when the miner ran.