Skip to main content

Crawl Google Cloud Dataplex

Configure and run the crawler to extract metadata from Google Cloud Dataplex. The crawler discovers Dataplex entries and their associated aspect metadata.

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

To crawl metadata from Google Cloud Dataplex, review the order of operations and then complete the following steps.

  1. In the top right corner of any screen, click New and then click New Workflow.
  2. From the list of packages, select Google Dataplex and click Setup Workflow.

Configure authentication

  1. Service Account JSON: Select the Google Cloud Service Account credential with Dataplex permissions that you created during setup.

  2. Project ID: Enter the Google Cloud project ID where your Dataplex catalog is configured. The connector automatically discovers and searches across all available locations within the project.

  3. Click Next to proceed to the configuration step.

Configure connection

Complete the connection configuration for your Dataplex environment:

  1. Connection: (Optional) Select a BigQuery connection to filter entries. This limits the crawl to entries associated with the selected BigQuery connection. If not specified, all available entries are crawled.

  2. Include Aspect Types: (Optional) Select specific aspect types to include. If specified, only entries using these aspect types are extracted. Leave empty to extract all aspects.

  3. Exclude Aspect Types: (Optional) Select aspect types to exclude. Entries using these aspects are skipped during extraction.

  4. Preflight Check: Click to run preflight checks and verify Google Dataplex API accessibility, authentication, and permissions before running the crawler.

  5. At the bottom of the screen, click Next to proceed.

Run crawler

After completing the configuration:

  • To run the crawler once, immediately, at the bottom of the screen click Run.
  • To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.

Verify crawled assets

Once the crawler has completed running, you can see the assets in Atlan's asset page.

  1. Monitor the crawler progress in the Workflows section:
    • View real-time status updates
    • Check the Logs tab for detailed execution information
    • Wait for the status to show Success before proceeding

See also