Set up dbt Core

This guide explains how to set up dbt Core in Atlan, including configuring access, organizing your storage bucket, and uploading the necessary metadata files so Atlan can process and analyze your dbt project data.

Setup and access management

In this section, learn how to configure access for dbt Core so Atlan can connect to your storage location and read the required metadata. Choose between using your own cloud storage bucket or an Atlan-managed bucket.

Use your own bucket (recommended)
Use Atlan bucket

Depending on the cloud provider in use, go to Marketplace → search for dbt → click to set up dbt → select Object Storage, and then choose the desired cloud provider. Atlan supports reading from AWS, Azure, and GCP. The setup process prompts for the information required for each cloud provider. For authentication, refer to the following:

To avoid access issues, Atlan can help you uploading the required files to the same bucket where your tenant is hosted.

Amazon S3

Raise a support request to get the details of your Atlan S3 bucket and include the ARN value of the IAM user or IAM role that Atlan can provision access to. You need to create an IAM policy and attach it to the IAM user or role to upload the required files to your Atlan bucket. To create an IAM policy with the necessary permissions, follow the steps below

Google Cloud Storage

To use Atlan's Google Cloud Storage (GCS) bucket, first you have to create a new service account. Then Raise a support request to share the username of the service account with Atlan. The username is in the following format: [email protected]. The Atlan support team provides you with read and write access to a particular folder in the Atlan GCS bucket. Once Atlan has granted access, you can use the service account to upload the required files.

Structure the bucket

Once you have configured access, the next step is to organize your storage bucket so that Atlan can correctly identify and process uploaded files.

info

Atlan uses the metadata.invocation_id and metadata.project_id attributes to uniquely identify and link the uploaded files. Atlan doesn't use the file paths to identify a project or job that the file belongs to. The following directory structure is provided as a guideline

Atlan supports extracting dbt metadata from multiple or single dbt projects. The main-prefix has the following format gcs|s3://<BUCKET_NAME>/<PATH_PREFIX> or abfss://<CONTAINER>/<PATH>, if you used Atlan's bucket, the Atlan support team provides it after setting up access policies on your bucket.

You need to use the following directory structure, even if you have a single dbt project:

main-prefix
- project1
    - job1
        - manifest.json
        - other files
    - job2
        - manifest.json
        - other files
    - job4
        - manifest.json
        - other files
- project3
    - job5
        - manifest.json
        - other files

Upload project files

To load correct metadata, Atlan processes the manifest.json and run_results.json files for each job. There are many ways to load the metadata, below are suggested approaches from Atlan. You need to upload the files from the target directory of the dbt project into distinct folders. Upload the run artifacts generated from the following commands:

(Required) Compilation results:

dbt compile --full-refresh

This command generates files that contain a full representation of your dbt project's resources, including models, tests, macros, node configurations, resource properties, and more.

Files to upload: manifest.json and run_results.json

Alternatively, you can upload the same files by running the dbt run --full-refresh command.

(Optional) Test results:

dbt test

This command executes all dbt tests in a dbt project and generates files that contain the test results.

Files to upload: manifest.json and run_results.json

(Optional) Catalog:

dbt docs generate

This command generates metadata about the tables and views produced by the models in your dbt project, for example, column data types and table statistics.

Files to upload: manifest.json and catalog.json

Setup and access management​

Amazon S3​

Azure ADLS​

Google GCS​

Amazon S3​

Google Cloud Storage​

Structure the bucket​

Upload project files​

Setup and access management

Amazon S3

Azure ADLS

Google GCS

Amazon S3

Google Cloud Storage

Structure the bucket

Upload project files