Crawl AWS Glue
Configure and run the AWS Glue crawler to extract metadata from your AWS Glue Data Catalog into Atlan. This enables you to discover, catalog, and govern your AWS Glue jobs, workflows, and data transformations alongside your other data assets.
Prerequisites
Before you begin, make sure you have:
- Configured AWS Glue access permissions
- Admin or connection admin privileges in Atlan
- Reviewed the order of operations for running workflows
Create crawler workflow
To crawl metadata from AWS Glue:
- In the top right corner of any screen, navigate to New and then click New Workflow.
- From the list of packages, select Glue Assets, and click Setup Workflow.
Choose extraction method
Select your extraction method and configure the necessary credentials for AWS Glue access.
- Direct extraction
- Agent extraction
Direct extraction connects Atlan directly to your AWS Glue service to crawl metadata.
-
Configure authentication based on the method you set up when configuring AWS Glue access permissions:
For IAM User authentication:
- Enter the AWS Access Key you configured
- Enter the AWS Secret Key you configured
- Enter the Region of your AWS Glue deployment
For IAM Role authentication:
- Set the AWS Role ARN to the ARN of the role you created in your AWS account
- (Optional) Under External ID, click the Generate button. Click the button to the right to copy the generated ID and use this in setting up your trust policy
- Enter the Region of your AWS Glue deployment
-
Click Test Authentication to confirm connectivity to AWS Glue.
-
Once successful, at the bottom of the screen, click Next.
Agent extraction uses Atlan's Secure Agent to execute metadata extraction within your organization's environment.
-
Configure the AWS Glue data source by adding the secret keys for your secret store based on your authentication method:
For IAM User authentication:
- Add the secret key for AWS Access Key
- Add the secret key for AWS Secret Key
- Add the secret key for Region
For IAM Role authentication:
- Add the secret key for AWS Role ARN
- (Optional) Add the secret key for External ID
- Add the secret key for Region
-
Complete the Secure Agent configuration by following the instructions in Configure Secure Agent for workflow execution.
-
Click Next after completing the configuration.
Configure connection
Complete the connection configuration for your AWS Glue environment:
-
Provide a Connection Name that represents your source environment. For example, you might want to use values like
production
,development
,gold
, oranalytics
. -
To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection—not even admins.
-
At the bottom of the screen, click Next to proceed.
Configure crawler
Configure the AWS Glue crawler settings to control which assets are included in the metadata extraction. If an asset appears in both the include and exclude filters, the exclude filter takes precedence.
- Include Metadata: Select assets you want to include in crawling. This defaults to all assets if none are specified.
- Exclude Metadata: Select assets you want to exclude from crawling. This defaults to no assets if none are specified.
Run crawler
After completing the configuration:
- To run the crawler once, immediately, at the bottom of the screen click Run.
- To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.
Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉
See also
- What does Atlan crawl from AWS Glue - Learn about the AWS Glue assets and metadata that Atlan discovers and catalogs.