Skip to main content

Crawl SageMaker

Configure and run the SageMaker crawler to extract lineage from your machine learning workflows and catalog your ML assets in Atlan.

Prerequisites

Before you begin, make sure you have:

  • Completed SageMaker setup
  • Admin or connection admin privileges in Atlan
  • AWS credentials (Access Key ID and Secret Access Key or IAM Role ARN)
  • AWS region where your SageMaker resources are located

Create crawler workflow

Follow these steps to create a workflow in Atlan that captures metadata from SageMaker.

  1. In Atlan, select New > New Workflow.

  2. From the package list, choose SageMaker.

  3. Select Setup Workflow.

Configure authentication

Configure authentication for your extraction method:

  • In Direct extraction, Atlan connects to your AWS SageMaker service and crawls metadata directly.
  • In Agent extraction, Self-Deployed Runtime executes metadata extraction within your organization's environment.

In Direct extraction, Atlan connects to your AWS SageMaker service and crawls metadata directly.

  1. Extraction method: Select Direct

  2. Choose your authentication method:

    • IAM User: Enter your AWS Access Key ID and Secret Access Key
    • IAM Role: Enter your IAM Role ARN for cross-account access
  3. Enter your AWS credentials:

    • AWS Region: Enter your primary SageMaker region (for example, us-east-1)
    • For IAM User:
      • AWS Access Key ID: Enter your AWS Access Key ID
      • AWS Secret Access Key: Enter your AWS Secret Access Key
    • For IAM Role:
      • AWS Role ARN: Enter your IAM Role ARN for cross-account access
      • (Optional) External ID: Enter the external ID provided by Atlan support
  4. Click Test Connection to verify your AWS credentials work correctly.

  5. Once successful, click Next.

Configure connection

To complete the Sagemaker connection configuration:

  1. Provide a Connection Name that represents your source environment. For example, you might use values like production, development, gold, or analytics.

  2. (Optional) To change the users able to manage this connection, change the users or groups listed under Connection Admins.

    warning

    If you don't specify any user or group, nobody can manage the connection - not even admins.

  3. At the bottom of the screen, click Next to proceed.

Run crawler

To run the Sagemaker crawler, after completing the previous steps:

  1. To check for any permissions or other configuration issues before running the crawler, click Preflight checks.
  2. You can either:
    • To run the crawler once immediately, at the bottom of the screen, click the Run button.
    • To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.

Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉

Troubleshooting

If you encounter connection or authentication issues during the crawl setup, see Connection and authentication issues for detailed troubleshooting steps.

See also