Skip to main content

Crawl AWS SageMaker Unified Studio

Configure and run the SMUS Catalog crawler to extract assets, metadata, and lineage from SMUS and catalog and enrich them in Atlan.

Prerequisites

Before you begin, make sure you have:

Create crawler workflow

To crawl metadata from AWS SageMaker Unified Studio, review the order of operations and then complete the following steps.

  1. In the top right of any screen, navigate to New and then click New Workflow.
  2. From the list of packages, select AWS SageMaker Unified Studio and click Setup Workflow.

Configure authentication

Configure how Atlan authenticates with your AWS account to access SageMaker Unified Studio. Choose the authentication method that matches your security requirements.

  1. Enter your role details:
    • AWS Role ARN: Enter your IAM Role ARN for cross-account access (for example, arn:aws:iam::123456789012:role/role-name). This is the SMUS IAM Role created by deploying the CloudFormation template earlier.
    • External ID: Enter an external ID for additional security
    • Atlan API Token: Enter your Atlan API token
    • Region: Enter your primary SageMaker Catalog region (for example, us-east-1)
  2. Click Test Authentication to verify your AWS credentials work correctly.
  3. Once successful, click Next.

Configure connection

  1. Provide a Connection Name that represents your source environment. For example, you might want to use values like production, development, or analytics.
  2. To change the users able to manage this connection, change the users or groups listed under Connection Admins. If you don't specify any user or group, nobody can manage the connection, including admins.
  3. Click Next to proceed.

Configure crawler

  1. For Enrich Glossary, choose whether to ingest glossaries from SMUS:
    • Yes: Atlan ingests glossaries from SMUS and enriches them in Atlan.
    • No: Atlan skips glossary ingestion.
  2. If you selected Yes, select the Glossary in Atlan you want to enrich with glossaries from SMUS.
  3. Click Next to proceed.

Run crawler

  1. To check for any permissions or configuration issues before running the crawler, click Preflight checks. For details about the checks performed, see Preflight checks for Amazon SageMaker Unified Studio.

  2. You can either:

    • To run the crawler once immediately, click Run.
    • To schedule the crawler to run hourly, daily, weekly, or monthly, click Schedule Run.

Once the crawler has completed running, run the domain-assets linking script to link your SMUS assets to the corresponding Data Domains in Atlan.

After running the script, you can now see the assets in Atlan's asset page! 🎉

See also