Skip to main content

Crawl Google Cloud Dataplex

Configure and run the crawler to extract metadata from Google Cloud Dataplex. The crawler discovers Dataplex entries and their associated aspect metadata.

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

To crawl metadata from Google Cloud Dataplex, review the order of operations and then complete the following steps.

  1. In the top right corner of any screen, click New and then click New Workflow.
  2. From the list of packages, select Google Dataplex and click Setup Workflow.

Configure authentication

  1. Service Account JSON: Select the Google Cloud Service Account credential with Dataplex permissions that you created during setup.

  2. Location: Enter the Google Cloud locations where your Dataplex aspect types and entries are configured. Provide all locations where aspect types or entries (tables) exist, separated by commas (for example, global, us or global, region-us).

    You must provide all locations where aspect types are defined and where entries (tables) are registered. For example, if aspect type university-rank-system is defined in global location and tables users_table and orders_table are registered in us location, you must provide both: global, us. If you only provide one location, the connector may not discover all resources. Only global means tables in us won't be found, and only us means aspect types in global won't be discovered.

  3. Click Next to proceed to the configuration step.

Configure connection

Complete the connection configuration for your Dataplex environment:

  1. Connection: (Optional) Select a BigQuery connection to filter entries. This limits the crawl to entries associated with the selected BigQuery connection. If not specified, all available entries are crawled.

  2. Include Aspect Types: (Optional) Select specific aspect types to include. If specified, only entries using these aspect types are extracted. Leave empty to extract all aspects.

    Note that aspect types can be used by entries in different locations. For example, an aspect type defined in global location can be used by tables in us location. Both locations must be provided in the Location field for complete discovery.

  3. Exclude Aspect Types: (Optional) Select aspect types to exclude. Entries using these aspects are skipped during extraction.

  4. Preflight Check: Click to run preflight checks and verify Google Dataplex API accessibility, authentication, and permissions before running the crawler.

  5. At the bottom of the screen, click Next to proceed.

Run crawler

After completing the configuration:

  • To run the crawler once, immediately, at the bottom of the screen click Run.
  • To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.

Verify crawled assets

Once the crawler has completed running, you can see the assets in Atlan's asset page.

  1. Monitor the crawler progress in the Workflows section:
    • View real-time status updates
    • Check the Logs tab for detailed execution information
    • Wait for the status to show Success before proceeding

See also