Crawl dbt
Configure and run the dbt crawler to extract metadata from your dbt Cloud or dbt Core projects and enrich your assets with dbt model information, lineage, and documentation.
Prerequisites
Before you begin, make sure you have:
- Completed dbt Cloud setup (for dbt Cloud) or dbt Core setup (for dbt Core)
- Admin or connection admin privileges in Atlan
- Reviewed the order of operations for metadata enrichment workflows
Create crawler workflow
Follow these steps to create a workflow in Atlan that captures metadata from dbt.
- In Atlan, select New > New Workflow.
- From the package list, choose dbt Assets.
- Select Setup Workflow.
Configure authentication
Choose your dbt source and provide the required credentials.
- dbt Cloud
- dbt Core
- For Extraction method, click Cloud.
- For Host Name, enter the domain name of your dbt Cloud instance, if not the default. Include the
https://. For example:For more information on access URLs, refer to dbt documentation.https://cloud.getdbt.com - For Authentication Type, select Service Account or PAT depending on your token type.
- Enter your dbt Cloud token in the Token field.
- Click Test Authentication to verify the connection.
- For Extraction method, click Object storage.
- Select your Cloud Provider, AWS, GCP, or Azure.
- Choose the Authentication type (for example, IAM Role) and provide the required role or access details.
- Enter your bucket name, prefix, and region. You can find these details in your cloud storage configuration.
- Click Test Authentication to validate access.
Configure connection
To complete the dbt connection configuration:
-
Provide a Connection Name that represents your source environment. For example, you might use values like
analytics,production, ordevelopment. -
(Optional) To change the users able to manage this connection, change the users or groups listed under Connection Admins.
warningIf you don't specify any user or group, nobody can manage the connection - not even admins.
-
At the bottom of the screen, click Next to proceed.
Configure dbt settings
The configuration options change based on the Extraction method you selected earlier, Cloud or Core (object storage). Follow this step to fine-tune how dbt metadata is enriched in Atlan.
- dbt Cloud
- dbt Core
- Under Exclude Metadata, choose projects or environments you don't want to include in enrichment. Leave blank if you want all available projects.
- Under Include Metadata, select specific projects or environments to include.
- To limit the enrichment to a particular connection with materialized assets, click Connection and select the relevant option. (This defaults to all connections, if none are specified.)
- For Import Tags, click Yes to sync dbt tags from your Cloud workspace into Atlan.
- For Enrich Metadata in Materialized Assets, click Yes to enable enrichment for both dbt and materialized assets.
- To limit the enrichment to a particular connection with materialized assets, click Connection and select the relevant option. (This defaults to all connections, if none are specified.)
- If you want to import tags defined in your dbt files, click Yes under Import Tags.
Run crawler
To run the dbt crawler, after completing the previous steps:
- To check for any permissions or other configuration issues before running the crawler, click Preflight checks.
- You can either:
- To run the crawler once immediately, at the bottom of the screen, click the Run button.
- To schedule the crawler to run hourly, daily, weekly, or monthly, at the bottom of the screen, click the Schedule Run button.
Once the crawler has completed running, you can see the assets in Atlan's asset page! 🎉
See also
- Set up dbt Cloud: Configure authentication for dbt Cloud
- Set up dbt Core: Upload dbt Core project files to cloud storage
- Preflight checks for dbt: Verify permissions and configuration before crawling
- Manage dbt tags: Import and manage tags from dbt in Atlan