Set up Google BigQuery
You must be a Google BigQuery administrator to run these commands. For more information, see Google Cloud's Granting, changing, and revoking access to resources.
Atlan extracts metadata from Google BigQuery through read-only access. After you crawl metadata for your Google BigQuery assets, you can mine query history to construct lineage.
If you enable sample data preview or querying, Atlan cost-optimizes previews and queries for tables only. For views and materialized views, Atlan shows a cost nudge before you preview or query data. Learn more in What does Atlan crawl from Google BigQuery?.
You must create a service account to enable Atlan to extract metadata from Google BigQuery. To create a service account, you can either use:
- Google Cloud console
- Google Cloud CLI
Choose one method below. Both options create the same IAM role, service account, and service account key.
Prerequisites
Before you begin, make sure you have:
- Google BigQuery administrator access to create custom IAM roles, service accounts, and service account keys
- A Google Cloud project where you want to grant Atlan access
- Access to either the Google Cloud console or the Google Cloud CLI (
gcloud)
Permissions
Atlan requires the following permissions to extract metadata from Google BigQuery. Create a custom role with these permissions, then assign the role to the Atlan service account.
Metadata crawling
Use these permissions for the baseline metadata crawl. To configure permissions for crawling metadata, add the following permissions to the custom role:
-
bigquery.datasets.getenables Atlan to retrieve metadata about a dataset. -
bigquery.datasets.getIamPolicyenables Atlan to read a dataset's IAM permissions. -
bigquery.jobs.createenables Atlan to run jobs (including queries) within the project.warningWithout this, Atlan can't query the source.
-
bigquery.routines.getenables Atlan to retrieve routine definitions and metadata. -
bigquery.routines.listenables Atlan to list routines and metadata on routines. -
bigquery.tables.getenables Atlan to retrieve table metadata. -
bigquery.tables.getIamPolicyenables Atlan to read a table's IAM policy. -
bigquery.tables.listenables Atlan to list tables and metadata on tables. -
bigquery.readsessions.createenables Atlan to create a session to stream large results. -
bigquery.readsessions.getDataenables Atlan to retrieve data from the session. -
bigquery.readsessions.updateenables Atlan to cancel the session. -
resourcemanager.projects.getenables Atlan to retrieve project names and metadata.
Atlan uses the BigQuery tables.get API endpoint to capture metadata. If you crawl external Delta Lake format tables that aren't created as BigLake tables, BigQuery checks the latest Delta Lake checkpoint to detect schema changes. Add these Cloud Storage permissions to the custom role:
storage.objects.getstorage.objects.list
For more information, see Creating Delta Lake tables.
Add data preview and querying
Add these permissions if you enable data preview or querying for the connection. To configure permissions for previewing and querying data, add the following permissions to the custom role:
-
bigquery.tables.getDataenables Atlan to retrieve table data.warningThis permission is also required for retrieving metadata such as the row count and update time of a table.
-
bigquery.jobs.getenables Atlan to retrieve data and metadata on any job, including queries. -
bigquery.jobs.listAllenables Atlan to list all jobs and retrieve metadata on any job submitted by any user. -
bigquery.jobs.updateenables Atlan to cancel any job, including a running query.
Add query history mining
Add these permissions if you mine query history to build lineage.
Atlan currently doesn't support generating lineage using the bq cp commands - for example, bq cp <source-table> <destination-table>.
To configure permissions for mining query history, add the following permissions to the custom role:
bigquery.jobs.listAllenables Atlan to fetch all queries for a project.bigquery.jobs.getenables Atlan to access query text for queries.
Crawl tags
Add these permissions if you crawl tags or policy tags from Google BigQuery. To configure permissions for crawling Google BigQuery tags and policy tags, add the following permissions to the custom role:
resourcemanager.tagKeys.listenables Atlan to fetch all tag keys.resourcemanager.tagValues.listenables Atlan to fetch all tag values for tag keys.datacatalog.taxonomies.listenables Atlan to fetch all policy tag taxonomies.datacatalog.taxonomies.getenables Atlan to fetch all policy tag taxonomies.
Create custom role and service account
Create a custom role, service account, and service account key for Atlan. You can use the Google Cloud console or Google Cloud CLI.
- Google Cloud console
- Google Cloud CLI
Create custom role
Create a custom role in the Google Cloud console.
- Open the Google Cloud console.
- From the left menu under IAM and admin, click Roles.
- Using the dropdown list at the top of the page, select the project in which you want to create a role.
- From the upper left of the Roles page, click Create Role.
- In the Create role page, enter the following details:
- For Title, enter a meaningful name for the custom role - for example,
Atlan User Role. - For Description, enter a description for the custom role if needed.
- For ID, the Google Cloud console generates a custom role ID based on the custom role name. Edit the ID if necessary - the ID can't be changed later.
- For Role launch stage, assign a stage if needed - for example, Alpha or General availability.
- Click Add permissions to select the permissions you want to include in the custom role. In the Add permissions dialog, click the Enter property name or value filter and add the required and any optional permissions.
- Click Create to finish custom role setup.
- For Title, enter a meaningful name for the custom role - for example,
Create service account
Create a service account and add the custom role to it.
- Open the Google Cloud console.
- From the left menu under IAM and admin, click Service accounts.
- Select a Google Cloud project.
- From the upper left of the Service accounts page, click Create Service Account.
- For Service account details, enter the following details:
- For Service account name, enter a service account name to display in the Google Cloud console.
- For Service account ID, the Google Cloud console generates a service account ID based on this name. Edit the ID if necessary - the ID can't be changed later.
- For Service account description, enter a description for the service account if needed.
- Click Create and continue to proceed to the next step.
- For Grant this service account access to the project, enter the following details:
- Click the Select a role dropdown and then select the custom role you created earlier - for example,
Atlan User Role. - Click Continue to proceed to the next step.
- Click the Select a role dropdown and then select the custom role you created earlier - for example,
- Click Done to finish the service account setup.
Create service account key
Create a service account key for crawling Google BigQuery.
- Open the Google Cloud console.
- From the left menu under IAM and admin, click Service accounts.
- Select the Google Cloud project for which you created the service account.
- On the Service accounts page, click the email address of the service account that you want to create a key for.
- From the upper left of your service account page, click the Keys tab.
- On the Keys page, click the Add Key dropdown and then click Create new key.
- In the Create private key dialog, for Key type, click JSON and then click Create. This creates a service account key file. Download the key file and store it in a secure location. You can't download it again.
Prerequisites
Set up the Google Cloud CLI in any one of the following development environments:
- Cloud Shell - to use an online terminal with the gcloud CLI already set up, activate Cloud Shell:
- To launch a Cloud Shell session from the Google Cloud console, open the Google Cloud console, and from the top right, click the Activate Cloud Shell icon.
- A Cloud Shell session starts and displays a command-line prompt. It can take a few seconds for the session to initialize.
- Local shell - to use a local development environment, install and initialize the gcloud CLI.
Create custom role
To create a custom role with the requisite and any optional permissions, run the following command:
gcloud iam roles create atlanUserRole --project=<project_id> \
--title="Atlan User Role" --description="Atlan User Role to extract metadata" \
--permissions="bigquery.datasets.get,bigquery.datasets.getIamPolicy,bigquery.jobs.create,bigquery.readsessions.create,bigquery.readsessions.getData,bigquery.readsessions.update,bigquery.routines.get,bigquery.routines.list,bigquery.tables.get,bigquery.tables.getIamPolicy,bigquery.tables.list,resourcemanager.projects.get" \
--stage=ALPHA
- Replace
<project_id>with the project ID of your Google Cloud project.
Create service account
To create a service account, run the following command:
gcloud iam service-accounts create atlanUser \
--description="Atlan Service Account to extract metadata" \
--display-name="Atlan User"
To add your custom role to your service account, run the following command:
gcloud projects add-iam-policy-binding <project_id> \
--member="serviceAccount:atlanUser@<project_id>.iam.gserviceaccount.com" \
--role="atlanUserRole"
- Replace
<project_id>with the project ID of your Google Cloud project.
Create service account key
To create a service account key, run the following command:
gcloud iam service-accounts keys create <key_file_path> \
--iam-account="atlanUser@<project_id>.iam.gserviceaccount.com"
- Replace
<key_file_path>with a path to a new output file for the private key - for example,~/atlanUser-private-key.json. - Replace
<project_id>with the project ID of your Google Cloud project.
Troubleshooting
If you run into permission or authentication issues, see Troubleshoot Google BigQuery connectivity. If you still need help, contact Atlan support.
Next steps
- Crawl Google BigQuery: Create a connection and run the crawler to extract metadata from Google BigQuery