Crawl Google Cloud Knowledge Catalog

Connect docs via MCP

Configure and run the crawler to extract metadata from Google Cloud Knowledge Catalog. The crawler discovers Knowledge Catalog entries and their associated Aspect metadata.

Prerequisites

Before you begin, make sure you have:

Set up Google Cloud Knowledge Catalog authentication with a service account and the required permissions
Either a Google Cloud Service Account JSON key file, or Workload Identity Federation (WIF) credentials configured

Create crawler workflow

To crawl metadata from Google Cloud Knowledge Catalog, review the order of operations and then complete the following steps.

In your Atlan workspace, click Connectors in the left sidebar.
- If you are using the Old UI (Classic), click New Workflow in the top navigation.
Click Marketplace.
Search for Google Knowledge Catalog and select it.
Click Install.
Once installation completes, click Setup Workflow on the same tile.

Configure authentication

For Connectivity, choose how you want Atlan to connect to Google Knowledge Catalog:
- Public Endpoint: Connect using the public Knowledge Catalog API endpoint from Google.
- Private Service Connect: Connect through a private endpoint. Contact Atlan support to request the DNS name of the Private Service Connect endpoint. For PSC Hostname, enter the DNS name provided.
Choose an authentication method:

Service account key
Workload Identity Federation (WIF)

Service Account JSON: Select the Google Cloud Service Account credential with Knowledge Catalog permissions that you created during setup.
Project ID: Enter the Google Cloud project ID associated with your service account.

After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed.

Project ID: Enter the Google Cloud project ID associated with your service account.
Service Account Email: Enter the email address of the service account to impersonate (for example, atlan-kc@<project-id>.iam.gserviceaccount.com).

WIF Pool Provider ID: Enter the WIF Pool Provider resource name in this format:

//iam.googleapis.com/projects/<project-number>/locations/global/workloadIdentityPools/<pool-id>/providers/<provider-id>

Atlan OAuth Client ID: Enter the OAuth Client ID from Atlan that was used when configuring the WIF provider.
Atlan OAuth Client Secret: Enter the corresponding OAuth Client Secret.

After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed.

Configure connection

Set up the connection name and access controls for your Google Cloud Knowledge Catalog data source in Atlan.

Provide a Connection Name that represents your source environment. For example, you might use values like production, development, or knowledge-catalog.
To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection (not even admins).
At the bottom of the screen, click Next to proceed.

Configure crawler

Configure which Knowledge Catalog entries to extract and which optional features to enable.

Connection: Select the BigQuery connection whose assets Knowledge Catalog entries are linked to. This is required—Knowledge Catalog Aspects and scan results are attached to BigQuery assets in Atlan.
Include Projects: (Optional) Enter one or more GCP project IDs to restrict the crawl to those projects. If not specified, all projects accessible to the service account are ingested.
Exclude Projects: (Optional) Enter one or more GCP project IDs to skip during the crawl.
Include Aspect Types: (Optional) Select specific Aspect Types to include. If specified, only entries using these Aspect Types are extracted. Leave empty to extract all Aspects.
Exclude Aspect Types: (Optional) Select Aspect Types to exclude. Entries using these Aspect Types are skipped during extraction.
Ingest Data Quality: (Optional) Enable to extract Data Quality scan results and attach them to the corresponding BigQuery assets. When enabled, the last 7 run results per scan are ingested. Requires dataplex.datascans.list, dataplex.datascans.get, and dataplex.datascans.getData permissions.
Ingest Data Profiling: (Optional) Enable to extract Data Profiling scan results and attach them to the corresponding BigQuery assets. Requires dataplex.datascans.list, dataplex.datascans.get, and dataplex.datascans.getData permissions.
Enable Aspects Reverse Sync: (Optional) Enable to permit Aspect field values edited in Atlan to be written back to Knowledge Catalog. Reverse sync targets BigQuery Tables, Views, Materialised Views, Columns, Schemas, and Routines. Disabled by default.
At the bottom of the screen, click Next to proceed.

Run crawler

Click Preflight checks to validate permissions and configuration before running the crawler.
After the preflight checks pass, you can either:
- Click Run to run the crawler once immediately.
- Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.

Once the crawler has completed running, you can see the assets in Atlan's asset page. Monitor progress in the Workflows section and check the Logs tab for detailed execution information.

Prerequisites​

Create crawler workflow​

Configure authentication​

Configure connection​

Configure crawler​

Run crawler​

See also​