Crawl SageMaker
Configure and run the SageMaker crawler to extract lineage from your machine learning workflows and catalog your ML assets in Atlan.
Prerequisites
Before you begin, make sure you have:
- Completed SageMaker setup
- Admin or connection admin privileges in Atlan
- AWS credentials (Access Key ID and Secret Access Key or IAM Role ARN)
- AWS region where your SageMaker resources are located
Create crawler workflow
Follow these steps to create a workflow in Atlan that captures metadata from SageMaker.
-
In Atlan, select New > New Workflow.
-
From the package list, choose SageMaker.
-
Select Setup Workflow.
Configure authentication
Configure authentication for your extraction method:
- In Direct extraction, Atlan connects to your AWS SageMaker service and crawls metadata directly.
- In Agent extraction, Self-Deployed Runtime executes metadata extraction within your organization's environment.
- Direct extraction
- Agent extraction
In Direct extraction, Atlan connects to your AWS SageMaker service and crawls metadata directly.
-
Extraction method: Select Direct
-
Choose your authentication method:
- IAM User: Enter your AWS Access Key ID and Secret Access Key
- IAM Role: Enter your IAM Role ARN for cross-account access
-
Enter your AWS credentials:
- AWS Region: Enter your primary SageMaker region (for example,
us-east-1) - For IAM User:
- AWS Access Key ID: Enter your AWS Access Key ID
- AWS Secret Access Key: Enter your AWS Secret Access Key
- For IAM Role:
- AWS Role ARN: Enter your IAM Role ARN for cross-account access
- (Optional) External ID: Enter the external ID provided by Atlan support
- AWS Region: Enter your primary SageMaker region (for example,
-
Click Test Connection to verify your AWS credentials work correctly.
-
Once successful, click Next.
Use Agent extraction when your AWS account or SageMaker service isn't reachable from Atlan Cloud (for example, it's behind a firewall). A Self-Deployed Runtime runs inside your network and connects to your AWS SageMaker service, then sends metadata to Atlan over an outbound connection.
Before configuring the crawler:
- Install Self-Deployed Runtime if you haven't already:
- Confirm the runtime can reach your AWS SageMaker service over your local network and that network security is configured.
To configure the crawler:
- Extraction method: Select Agent
- Configure AWS credentials by adding the secret keys for your secret store. For details on the required fields, refer to the Direct extraction section.
- Complete the Secure Agent configuration by following the instructions in the How to configure Secure Agent for workflow execution guide.
- Click Next after completing the configuration.
Configure connection
To complete the Sagemaker connection configuration:
-
Provide a Connection Name that represents your source environment. For example, you might use values like
production,development,gold, oranalytics. -
(Optional) To change the users able to manage this connection, change the users or groups listed under Connection Admins.
warningIf you don't specify any user or group, nobody can manage the connection - not even admins.
-
At the bottom of the screen, click Next to proceed.
Run crawler
To run the Sagemaker crawler, after completing the previous steps:
- To check for any permissions or other configuration issues before running the crawler, click Preflight checks.
- You can either:
- To run the crawler once immediately, at the bottom of the screen, click the Run button.
- To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.
Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉
Troubleshooting
If you encounter connection or authentication issues during the crawl setup, see Connection and authentication issues for detailed troubleshooting steps.
See also
- What does Atlan crawl from SageMaker: Learn what assets and metadata Atlan extracts from SageMaker