Set up dbt Core
This guide explains how to set up dbt Core in Atlan, including configuring access, organizing your storage bucket, and uploading the necessary metadata files so Atlan can process and analyze your dbt project data.
Setup and access management
In this section, learn how to configure access for dbt Core so Atlan can connect to your storage location and read the required metadata. Choose between using your own cloud storage bucket or an Atlan-managed bucket.
- Use your own bucket
- Use Atlan bucket (recommended)
Use this option if you store dbt artifacts in your own cloud storage bucket. You create a dedicated read credential for Atlan, then configure the connector in Atlan with your bucket details.
- AWS (S3)
- GCP (GCS)
- Azure (ADLS)
Step 1: Obtain Atlan's dbt service identity ARN
Contact Atlan support to request the Atlan dbt service identity ARN. You need this value to configure the trust relationship in Step 3.
Step 2: Create IAM policy
- In your AWS account, go to IAM → Policies → Create policy.
- Select the JSON tab and paste the following, replacing
<your-bucket>and<your-prefix>with your actual values:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AtlanDbtReadAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<your-bucket>",
"arn:aws:s3:::<your-bucket>/<your-prefix>/*"
]
}
]
}
- Name the policy (for example,
AtlanDbtCoreReadPolicy) and create it.
Step 3: Create IAM role with trust policy
- In AWS, go to IAM → Roles → Create role.
- Select Trusted entity type: AWS account → Another AWS account and enter the account ID from the Atlan dbt service identity ARN.
- Attach the policy you created in Step 2.
- Name the role (for example,
AtlanDbtCoreRole) and create it. - Open the new role and click Edit trust policy. Replace the policy with:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "<atlan-dbt-service-identity-arn>"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "<your-external-id>"
}
}
}
]
}
Replace <atlan-dbt-service-identity-arn> with the ARN from Step 1, and <your-external-id> with a unique string of your choice (for example, atlan-dbt-external-id). Note this external ID—you enter it in Atlan in the next step.
- Copy the Role ARN from the role summary page (format:
arn:aws:iam::123456789012:role/AtlanDbtCoreRole).
Step 4: Configure connector in Atlan
- Go to Marketplace → search for dbt → click to set up dbt.
- For Source, select Core.
- For Manifest Source, select External Object Storage.
- For Storage Provider, select AWS. Set Authentication to IAM Role.
- Enter:
- AWS Role ARN: the Role ARN from Step 3
- Bucket Name: your S3 bucket name
- Prefix: the path within the bucket where dbt artifacts are stored
- Region: your bucket's AWS region
- Click Test Authentication to verify, then proceed to configure the crawler.
Step 1: Create service account
- In the Google Cloud Console, go to IAM & Admin → Service accounts.
- Click Create service account and give it a name (for example,
atlan-dbt-core). - Grant the service account the Storage Object Viewer role (
roles/storage.objectViewer) on the GCS bucket containing your dbt artifacts. - Go to the service account's Keys tab → Add key → Create new key → JSON.
- Download the JSON key file.
For more detail on GCS IAM setup, see Set up Google Cloud Storage.
Step 2: Configure connector in Atlan
- Go to Marketplace → search for dbt → click to set up dbt.
- For Source, select Core.
- For Manifest Source, select External Object Storage.
- For Storage Provider, select GCP. Set Authentication to Service Account.
- Enter:
- Project ID: your GCP project ID
- Service Account JSON: paste the full contents of the JSON key file
- Bucket Name: your GCS bucket name
- Prefix: the path within the bucket where dbt artifacts are stored
- Click Test Authentication to verify, then proceed to configure the crawler.
Step 1: Register app and create client secret
- In the Azure Portal, go to Azure Active Directory → App registrations → New registration.
- Give the app a name (for example,
atlan-dbt-core) and register it. - Note the Application (client) ID and Directory (tenant) ID from the overview page.
- Go to Certificates & secrets → New client secret. Set an expiry and copy the secret Value immediately (shown only once).
For more detail on Azure Service Principal setup, see Object storage for apps.
Step 2: Grant access to storage container
- In the Azure Portal, go to your Storage account → Containers → your container.
- Click Access Control (IAM) → Add role assignment.
- Assign the Storage Blob Data Reader role to the app registration created in Step 1.
Step 3: Configure connector in Atlan
- Go to Marketplace → search for dbt → click to set up dbt.
- For Source, select Core.
- For Manifest Source, select External Object Storage.
- For Storage Provider, select AZURE. Set Authentication to Service Principal.
- Enter:
- Tenant ID: Directory (tenant) ID from Step 1
- Client ID: Application (client) ID from Step 1
- Client Secret: the secret value from Step 1
- Account Name: your Azure Storage account name
- Container Name: the container holding your dbt artifacts
- Prefix: the path within the container where dbt artifacts are stored
- Click Test Authentication to verify, then proceed to configure the crawler.
Use this option if you prefer not to manage cloud credentials. Atlan provisions a dedicated storage prefix in its own bucket and provides you with write credentials so your pipeline can upload dbt artifacts directly.
Step 1: Request storage prefix from Atlan support
Raise a support request and ask for:
- A dedicated Atlan-managed storage prefix for your dbt project
- Write credentials for your pipeline (IAM user for AWS, service account email for GCP, or equivalent for Azure)
Atlan provisions the prefix and shares the path and upload credentials with you.
Step 2: Update pipeline to upload artifacts
Once you receive your prefix, configure your CI/CD pipeline or dbt job runner to upload artifacts after each run using the upload credentials provided. See Structure the bucket below for the required directory layout.
Upload using standard CLI tools:
- AWS:
aws s3 sync ./target/ s3://<atlan-bucket>/<your-prefix>/<project>/<job>/ - GCP:
gsutil -m cp -r ./target/ gs://<atlan-bucket>/<your-prefix>/<project>/<job>/ - Azure:
azcopy sync ./target/ "https://<account>.blob.core.windows.net/<container>/<your-prefix>/<project>/<job>/"
Step 3: Configure connector in Atlan
- Go to Marketplace → search for dbt → click to set up dbt.
- For Source, select Core.
- For Manifest Source, select Atlan Object Storage.
- Enter the Object Storage Prefix provided by Atlan support.
- Proceed to configure the crawler.
Structure the bucket
Once you have configured access, the next step is to organize your storage bucket so that Atlan can correctly identify and process uploaded files.
Atlan uses the metadata.invocation_id and metadata.project_id attributes to uniquely identify and link the uploaded files. Atlan doesn't use the file paths to identify a project or job that the file belongs to. The following directory structure is provided as a guideline.
Atlan supports extracting dbt metadata from multiple or single dbt projects. The main-prefix has the following format gcs|s3://<BUCKET_NAME>/<PATH_PREFIX> or abfss://<CONTAINER>/<PATH>, if you used Atlan's bucket, the Atlan support team provides it after setting up access policies on your bucket.
The <PATH_PREFIX> (or <PATH> for Azure) is optional. If your dbt project directories live at the bucket or container root, leave the Prefix field empty when you configure the crawler and place your project folders directly under the bucket or container.
You need to use the following directory structure, even if you have a single dbt project:
main-prefix
- project1
- job1
- manifest.json
- other files
- job2
- manifest.json
- other files
- job4
- manifest.json
- other files
- project3
- job5
- manifest.json
- other files
Upload project files
To load correct metadata, Atlan processes the manifest.json and run_results.json files for each job. There are many ways to load the metadata, below are suggested approaches from Atlan. You need to upload the files from the target directory of the dbt project into distinct folders. Upload the run artifacts generated from the following commands:
- (Required) Compilation results:
dbt compile --full-refresh
This command generates files that contain a full representation of your dbt project's resources, including models, tests, macros, node configurations, resource properties, and more.
Files to upload: manifest.json and run_results.json
Alternatively, you can upload the same files by running the dbt run --full-refresh command.
- (Optional) Test results:
dbt test
This command executes all dbt tests in a dbt project and generates files that contain the test results.
Files to upload: manifest.json and run_results.json
- (Optional) Catalog:
dbt docs generate
This command generates metadata about the tables and views produced by the models in your dbt project, for example, column data types and table statistics.
Files to upload: manifest.json and catalog.json