Crawl Google Cloud Dataplex
Configure and run the crawler to extract metadata from Google Cloud Dataplex. The crawler discovers Dataplex entries and their associated aspect metadata.
Prerequisites
Before you begin, make sure you have:
- Set up Google Cloud Dataplex authentication with a service account and required permissions
- Google Cloud Service Account credential JSON file downloaded and ready to upload
Create crawler workflow
To crawl metadata from Google Cloud Dataplex, review the order of operations and then complete the following steps.
- In the top right corner of any screen, click New and then click New Workflow.
- From the list of packages, select Google Dataplex and click Setup Workflow.
Configure authentication
-
Service Account JSON: Select the Google Cloud Service Account credential with Dataplex permissions that you created during setup.
-
Location: Enter the Google Cloud locations where your Dataplex aspect types and entries are configured. Provide all locations where aspect types or entries (tables) exist, separated by commas (for example,
global, usorglobal, region-us).You must provide all locations where aspect types are defined and where entries (tables) are registered. For example, if aspect type
university-rank-systemis defined ingloballocation and tablesusers_tableandorders_tableare registered inuslocation, you must provide both:global, us. If you only provide one location, the connector may not discover all resources. Onlyglobalmeans tables inuswon't be found, and onlyusmeans aspect types inglobalwon't be discovered. -
Click Next to proceed to the configuration step.
Configure connection
Complete the connection configuration for your Dataplex environment:
-
Connection: (Optional) Select a BigQuery connection to filter entries. This limits the crawl to entries associated with the selected BigQuery connection. If not specified, all available entries are crawled.
-
Include Aspect Types: (Optional) Select specific aspect types to include. If specified, only entries using these aspect types are extracted. Leave empty to extract all aspects.
Note that aspect types can be used by entries in different locations. For example, an aspect type defined in
globallocation can be used by tables inuslocation. Both locations must be provided in the Location field for complete discovery. -
Exclude Aspect Types: (Optional) Select aspect types to exclude. Entries using these aspects are skipped during extraction.
-
Preflight Check: Click to run preflight checks and verify Google Dataplex API accessibility, authentication, and permissions before running the crawler.
-
At the bottom of the screen, click Next to proceed.
Run crawler
After completing the configuration:
- To run the crawler once, immediately, at the bottom of the screen click Run.
- To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.
Verify crawled assets
Once the crawler has completed running, you can see the assets in Atlan's asset page.
- Monitor the crawler progress in the Workflows section:
- View real-time status updates
- Check the Logs tab for detailed execution information
- Wait for the status to show Success before proceeding
See also
- What does Atlan crawl from Dataplex: Learn what assets, metadata, and lineage Atlan crawls from Dataplex