Skip to main content

Set up Google BigQuery

Who can do this?

You must be a Google BigQuery administrator to run these commands. For more information, see Google Cloud's Granting, changing, and revoking access to resources.

Atlan extracts metadata from Google BigQuery through read-only access. After you crawl metadata for your Google BigQuery assets, you can mine query history to construct lineage.

If you enable sample data preview or querying, Atlan cost-optimizes previews and queries for tables only. For views and materialized views, Atlan shows a cost nudge before you preview or query data. Learn more in What does Atlan crawl from Google BigQuery?.

You must create a service account to enable Atlan to extract metadata from Google BigQuery. To create a service account, you can either use:

  • Google Cloud console
  • Google Cloud CLI

Choose one method below. Both options create the same IAM role, service account, and service account key.

Prerequisites

Before you begin, make sure you have:

  • Google BigQuery administrator access to create custom IAM roles, service accounts, and service account keys
  • A Google Cloud project where you want to grant Atlan access
  • Access to either the Google Cloud console or the Google Cloud CLI (gcloud)

Permissions

Atlan requires the following permissions to extract metadata from Google BigQuery. Create a custom role with these permissions, then assign the role to the Atlan service account.

Metadata crawling

Use these permissions for the baseline metadata crawl. To configure permissions for crawling metadata, add the following permissions to the custom role:

  • bigquery.datasets.get enables Atlan to retrieve metadata about a dataset.

  • bigquery.datasets.getIamPolicy enables Atlan to read a dataset's IAM permissions.

  • bigquery.jobs.create enables Atlan to run jobs (including queries) within the project.

    warning

    Without this, Atlan can't query the source.

  • bigquery.routines.get enables Atlan to retrieve routine definitions and metadata.

  • bigquery.routines.list enables Atlan to list routines and metadata on routines.

  • bigquery.tables.get enables Atlan to retrieve table metadata.

  • bigquery.tables.getIamPolicy enables Atlan to read a table's IAM policy.

  • bigquery.tables.list enables Atlan to list tables and metadata on tables.

  • bigquery.readsessions.create enables Atlan to create a session to stream large results.

  • bigquery.readsessions.getData enables Atlan to retrieve data from the session.

  • bigquery.readsessions.update enables Atlan to cancel the session.

  • resourcemanager.projects.get enables Atlan to retrieve project names and metadata.

Atlan uses the BigQuery tables.get API endpoint to capture metadata. If you crawl external Delta Lake format tables that aren't created as BigLake tables, BigQuery checks the latest Delta Lake checkpoint to detect schema changes. Add these Cloud Storage permissions to the custom role:

  • storage.objects.get
  • storage.objects.list

For more information, see Creating Delta Lake tables.

Add data preview and querying

Add these permissions if you enable data preview or querying for the connection. To configure permissions for previewing and querying data, add the following permissions to the custom role:

  • bigquery.tables.getData enables Atlan to retrieve table data.

    warning

    This permission is also required for retrieving metadata such as the row count and update time of a table.

  • bigquery.jobs.get enables Atlan to retrieve data and metadata on any job, including queries.

  • bigquery.jobs.listAll enables Atlan to list all jobs and retrieve metadata on any job submitted by any user.

  • bigquery.jobs.update enables Atlan to cancel any job, including a running query.

Add query history mining

Add these permissions if you mine query history to build lineage. Atlan currently doesn't support generating lineage using the bq cp commands - for example, bq cp <source-table> <destination-table>. To configure permissions for mining query history, add the following permissions to the custom role:

  • bigquery.jobs.listAll enables Atlan to fetch all queries for a project.
  • bigquery.jobs.get enables Atlan to access query text for queries.

Crawl tags

Add these permissions if you crawl tags or policy tags from Google BigQuery. To configure permissions for crawling Google BigQuery tags and policy tags, add the following permissions to the custom role:

  • resourcemanager.tagKeys.list enables Atlan to fetch all tag keys.
  • resourcemanager.tagValues.list enables Atlan to fetch all tag values for tag keys.
  • datacatalog.taxonomies.list enables Atlan to fetch all policy tag taxonomies.
  • datacatalog.taxonomies.get enables Atlan to fetch all policy tag taxonomies.

Create custom role and service account

Create a custom role, service account, and service account key for Atlan. You can use the Google Cloud console or Google Cloud CLI.

Create custom role

Create a custom role in the Google Cloud console.

  1. Open the Google Cloud console.
  2. From the left menu under IAM and admin, click Roles.
  3. Using the dropdown list at the top of the page, select the project in which you want to create a role.
  4. From the upper left of the Roles page, click Create Role.
  5. In the Create role page, enter the following details:
    1. For Title, enter a meaningful name for the custom role - for example, Atlan User Role.
    2. For Description, enter a description for the custom role if needed.
    3. For ID, the Google Cloud console generates a custom role ID based on the custom role name. Edit the ID if necessary - the ID can't be changed later.
    4. For Role launch stage, assign a stage if needed - for example, Alpha or General availability.
    5. Click Add permissions to select the permissions you want to include in the custom role. In the Add permissions dialog, click the Enter property name or value filter and add the required and any optional permissions.
    6. Click Create to finish custom role setup.

Create service account

Create a service account and add the custom role to it.

  1. Open the Google Cloud console.
  2. From the left menu under IAM and admin, click Service accounts.
  3. Select a Google Cloud project.
  4. From the upper left of the Service accounts page, click Create Service Account.
  5. For Service account details, enter the following details:
    1. For Service account name, enter a service account name to display in the Google Cloud console.
    2. For Service account ID, the Google Cloud console generates a service account ID based on this name. Edit the ID if necessary - the ID can't be changed later.
    3. For Service account description, enter a description for the service account if needed.
    4. Click Create and continue to proceed to the next step.
  6. For Grant this service account access to the project, enter the following details:
    1. Click the Select a role dropdown and then select the custom role you created earlier - for example, Atlan User Role.
    2. Click Continue to proceed to the next step.
  7. Click Done to finish the service account setup.

Create service account key

Create a service account key for crawling Google BigQuery.

  1. Open the Google Cloud console.
  2. From the left menu under IAM and admin, click Service accounts.
  3. Select the Google Cloud project for which you created the service account.
  4. On the Service accounts page, click the email address of the service account that you want to create a key for.
  5. From the upper left of your service account page, click the Keys tab.
  6. On the Keys page, click the Add Key dropdown and then click Create new key.
  7. In the Create private key dialog, for Key type, click JSON and then click Create. This creates a service account key file. Download the key file and store it in a secure location. You can't download it again.

Troubleshooting

If you run into permission or authentication issues, see Troubleshoot Google BigQuery connectivity. If you still need help, contact Atlan support.

Next steps

  • Crawl Google BigQuery: Create a connection and run the crawler to extract metadata from Google BigQuery