Crawl Google Cloud Dataplex
Configure and run the crawler to extract metadata from Google Cloud Dataplex. The crawler discovers Dataplex entries and their associated aspect metadata.
Prerequisites
Before you begin, make sure you have:
- Set up Google Cloud Dataplex authentication with a service account and required permissions
- Google Cloud Service Account credential JSON file downloaded and ready to upload
Create crawler workflow
To crawl metadata from Google Cloud Dataplex, review the order of operations and then complete the following steps.
- In the top right corner of any screen, click New and then click New Workflow.
- From the list of packages, select Google Dataplex and click Setup Workflow.
Configure authentication
-
Service Account JSON: Select the Google Cloud Service Account credential with Dataplex permissions that you created during setup.
-
Location: Enter the Google Cloud location where your Dataplex aspect types are defined. The connector automatically expands the location to search for assets based on the following rules:
globalsearches for assets across all GCP locations- A multi-region such as
us,eu, orasiasearches all regions within that multi-region - A single region such as
us-central1searches only that specific region
You can provide multiple locations separated by commas (for example,
globalorus, eu). Theregion-prefix is also supported (for example,region-global,region-us). -
Click Next to proceed to the configuration step.
Configure connection
Complete the connection configuration for your Dataplex environment:
-
Connection: (Optional) Select a BigQuery connection to filter entries. This limits the crawl to entries associated with the selected BigQuery connection. If not specified, all available entries are crawled.
-
Include Aspect Types: (Optional) Select specific aspect types to include. If specified, only entries using these aspect types are extracted. Leave empty to extract all aspects.
Note that aspect types can be used by entries in different locations. For example, an aspect type defined in
globallocation can be used by tables inuslocation. Both locations must be provided in the Location field for complete discovery. -
Exclude Aspect Types: (Optional) Select aspect types to exclude. Entries using these aspects are skipped during extraction.
-
Preflight Check: Click to run preflight checks and verify Google Dataplex API accessibility, authentication, and permissions before running the crawler.
-
At the bottom of the screen, click Next to proceed.
Run crawler
After completing the configuration:
- To run the crawler once, immediately, at the bottom of the screen click Run.
- To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.
Verify crawled assets
Once the crawler has completed running, you can see the assets in Atlan's asset page.
- Monitor the crawler progress in the Workflows section:
- View real-time status updates
- Check the Logs tab for detailed execution information
- Wait for the status to show Success before proceeding
See also
- What does Atlan crawl from Dataplex: Learn what assets, metadata, and lineage Atlan crawls from Dataplex