Crawl SageMaker
Configure and run the SageMaker crawler to extract lineage from your machine learning workflows and catalog your ML assets in Atlan.
Prerequisites
Before you begin, make sure you have:
- Completed SageMaker setup
- Admin or connection admin privileges in Atlan
- AWS credentials (Access Key ID and Secret Access Key or IAM Role ARN)
- AWS region where your SageMaker resources are located
Create crawler workflow
Follow these steps to create a workflow in Atlan that captures metadata from SageMaker.
-
In Atlan, select New > New Workflow.
-
From the package list, choose SageMaker.
-
Select Setup Workflow.
Configure authentication
Choose your authentication method and enter your AWS credentials.
- IAM user
- IAM role
-
Enter your AWS credentials:
- AWS Access Key ID: Enter your AWS Access Key ID
- AWS Secret Access Key: Enter your AWS Secret Access Key
- AWS Region: Enter your primary SageMaker region (for example,
us-east-1)
-
Click Test Connection to verify your AWS credentials work correctly.
-
Enter your role details:
- AWS Region: Enter your primary SageMaker region (for example,
us-east-1) - AWS Role ARN: Enter your IAM Role ARN for cross-account access
- (Optional) External ID: Enter the external ID provided by Atlan support
- AWS Region: Enter your primary SageMaker region (for example,
-
Click Test Connection to verify your AWS credentials work correctly.
Configure connection
To complete the Sagemaker connection configuration:
-
Provide a Connection Name that represents your source environment. For example, you might use values like
production,development,gold, oranalytics. -
(Optional) To change the users able to manage this connection, change the users or groups listed under Connection Admins.
warningIf you don't specify any user or group, nobody can manage the connection - not even admins.
-
At the bottom of the screen, click Next to proceed.
Run crawler
To run the Sagemaker crawler, after completing the previous steps:
- To check for any permissions or other configuration issues before running the crawler, click Preflight checks.
- You can either:
- To run the crawler once immediately, at the bottom of the screen, click the Run button.
- To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.
Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉
Troubleshooting
If you encounter connection or authentication issues during the crawl setup, see Connection and authentication issues for detailed troubleshooting steps.
See also
- What does Atlan crawl from SageMaker: Learn what assets and metadata Atlan extracts from SageMaker