Skip to main content

Crawl AWS Glue

Configure and run the AWS Glue crawler to extract metadata from your AWS Glue Data Catalog into Atlan. This enables you to discover, catalog, and govern your AWS Glue jobs, workflows, and data transformations alongside your other data assets.

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

To crawl metadata from AWS Glue:

  1. In the top right corner of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select Glue Assets, and click Setup Workflow.

Choose extraction method

Select your extraction method and configure the necessary credentials for AWS Glue access.

Direct extraction connects Atlan directly to your AWS Glue service to crawl metadata.

  1. Configure authentication based on the method you set up when configuring AWS Glue access permissions:

    For IAM User authentication:

    • Enter the AWS Access Key you configured
    • Enter the AWS Secret Key you configured
    • Enter the Region of your AWS Glue deployment

    For IAM Role authentication:

  2. Click Test Authentication to confirm connectivity to AWS Glue.

  3. Once successful, at the bottom of the screen, click Next.

Configure connection

Complete the connection configuration for your AWS Glue environment:

  1. Provide a Connection Name that represents your source environment. For example, you might want to use values like production, development, gold, or analytics.

  2. To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection—not even admins.

  3. At the bottom of the screen, click Next to proceed.

Configure crawler

Configure the AWS Glue crawler settings to control which assets are included in the metadata extraction. If an asset appears in both the include and exclude filters, the exclude filter takes precedence.

  • Include Metadata: Select assets you want to include in crawling. This defaults to all assets if none are specified.
  • Exclude Metadata: Select assets you want to exclude from crawling. This defaults to no assets if none are specified.

Run crawler

After completing the configuration:

  • To run the crawler once, immediately, at the bottom of the screen click Run.
  • To schedule the crawler to run hourly, daily, weekly or monthly, at the bottom of the screen click Schedule & Run.

Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉

See also