Skip to main content

Crawl Google Cloud Knowledge Catalog

Configure and run the crawler to extract metadata from Google Cloud Knowledge Catalog. The crawler discovers Knowledge Catalog entries and their associated Aspect metadata.

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

To crawl metadata from Google Cloud Knowledge Catalog, review the order of operations and then complete the following steps.

  1. In the top navigation, click Marketplace.
  2. Search for Google Knowledge Catalog and select it.
  3. Click Install.
  4. Once installation completes, click Setup Workflow on the same tile.

If you navigated away before installation completed, go to New > New Workflow and select Google Knowledge Catalog to proceed.

Configure authentication

  1. For Connectivity, choose how you want Atlan to connect to Google Knowledge Catalog:

    • Public Endpoint: Connect using the public Knowledge Catalog API endpoint from Google.
    • Private Service Connect: Connect through a private endpoint. Contact Atlan support to request the DNS name of the Private Service Connect endpoint. For PSC Hostname, enter the DNS name provided.
  2. Choose an authentication method:

  1. Service Account JSON: Select the Google Cloud Service Account credential with Knowledge Catalog permissions that you created during setup.
  2. Project ID: Enter the Google Cloud project ID associated with your service account.

After entering the authentication details, click Test Authentication to verify your configuration. If the test is successful, click Next to proceed.

Configure connection

Set up the connection name and access controls for your Google Cloud Knowledge Catalog data source in Atlan.

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production, development, or knowledge-catalog.
  2. To change the users able to manage this connection, update the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection (not even admins).
  3. At the bottom of the screen, click Next to proceed.

Configure crawler

Configure which Knowledge Catalog entries to extract and which optional features to enable.

  1. Connection: Select the BigQuery connection whose assets Knowledge Catalog entries are linked to. This is required—Knowledge Catalog Aspects and scan results are attached to BigQuery assets in Atlan.

  2. Include Projects: (Optional) Enter one or more GCP project IDs to restrict the crawl to those projects. If not specified, all projects accessible to the service account are ingested.

  3. Exclude Projects: (Optional) Enter one or more GCP project IDs to skip during the crawl.

  4. Include Aspect Types: (Optional) Select specific Aspect Types to include. If specified, only entries using these Aspect Types are extracted. Leave empty to extract all Aspects.

  5. Exclude Aspect Types: (Optional) Select Aspect Types to exclude. Entries using these Aspect Types are skipped during extraction.

  6. Ingest Data Quality: (Optional) Enable to extract Data Quality scan results and attach them to the corresponding BigQuery assets. When enabled, the last 7 run results per scan are ingested. Requires dataplex.datascans.list, dataplex.datascans.get, and dataplex.datascans.getData permissions.

  7. Ingest Data Profiling: (Optional) Enable to extract Data Profiling scan results and attach them to the corresponding BigQuery assets. Requires dataplex.datascans.list, dataplex.datascans.get, and dataplex.datascans.getData permissions.

  8. Enable Aspects Reverse Sync: (Optional) Enable to permit Aspect field values edited in Atlan to be written back to Knowledge Catalog. Reverse sync targets BigQuery Tables, Views, Materialised Views, Columns, Schemas, and Routines. Disabled by default.

  9. At the bottom of the screen, click Next to proceed.

Run crawler

  1. Click Preflight checks to validate permissions and configuration before running the crawler.
  2. After the preflight checks pass, you can either:
    • Click Run to run the crawler once immediately.
    • Click Schedule Run to schedule the crawler to run hourly, daily, weekly, or monthly.

Once the crawler has completed running, you can see the assets in Atlan's asset page. Monitor progress in the Workflows section and check the Logs tab for detailed execution information.

See also