Crawl Google Cloud Dataplex
Configure and run the crawler to extract metadata from Google Cloud Dataplex. The crawler discovers Dataplex entries and their associated aspect metadata.
Prerequisites
Before you begin, make sure you have:
- Set up Google Cloud Dataplex authentication with a service account and required permissions
- Google Cloud Service Account credential JSON file downloaded and ready to upload
Create crawler workflow
To crawl metadata from Google Cloud Dataplex, review the order of operations and then complete the following steps.
- In the top right corner of any screen, click New and then click New Workflow.
- From the list of packages, select Google Dataplex and click Setup Workflow.
Configure authentication
-
Service Account JSON: Select the Google Cloud Service Account credential with Dataplex permissions that you created during setup.
-
Project ID: Enter the Google Cloud project ID where your Dataplex catalog is configured. The connector automatically discovers and searches across all available locations within the project.
-
Click Next to proceed to the configuration step.
Configure connection
Complete the connection configuration for your Dataplex environment:
-
Connection: (Optional) Select a BigQuery connection to filter entries. This limits the crawl to entries associated with the selected BigQuery connection. If not specified, all available entries are crawled.
-
Include Aspect Types: (Optional) Select specific aspect types to include. If specified, only entries using these aspect types are extracted. Leave empty to extract all aspects.
-
Exclude Aspect Types: (Optional) Select aspect types to exclude. Entries using these aspects are skipped during extraction.
-
Preflight Check: Click to run preflight checks and verify Google Dataplex API accessibility, authentication, and permissions before running the crawler.
-
At the bottom of the screen, click Next to proceed.
Run crawler
After completing the configuration:
- To run the crawler once, immediately, at the bottom of the screen click Run.
- To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.
Verify crawled assets
Once the crawler has completed running, you can see the assets in Atlan's asset page.
- Monitor the crawler progress in the Workflows section:
- View real-time status updates
- Check the Logs tab for detailed execution information
- Wait for the status to show Success before proceeding
See also
- What does Atlan crawl from Dataplex: Learn what assets, metadata, and lineage Atlan crawls from Dataplex