Set up dbt Core
This guide explains how to set up dbt Core in Atlan, including configuring access, organizing your storage bucket, and uploading the necessary metadata files so Atlan can process and analyze your dbt project data.
Setup and access management
In this section, learn how to configure access for dbt Core so Atlan can connect to your storage location and read the required metadata. Choose between using your own cloud storage bucket or an Atlan-managed bucket.
- Use your own bucket (recommended)
- Use Atlan bucket
Depending on the cloud provider in use, go to Marketplace → search for dbt → click to set up dbt → select Object Storage, and then choose the desired cloud provider. Atlan supports reading from AWS, Azure, and GCP. The setup process prompts for the information required for each cloud provider. For authentication, refer to the following:
Amazon S3
Please follow the instructions below in order to create the right IAM Role with the right permissions
Azure ADLS
Please follow the instructions below in order to create the right Service principle with the right permissions
Google GCS
Please follow the instructions below in order to create the right Service account with the right permissions
To avoid access issues, Atlan can help you uploading the required files to the same bucket where your tenant is hosted.
Amazon S3
Raise a support request to get the details of your Atlan S3 bucket and include the ARN value of the IAM user or IAM role that Atlan can provision access to. You need to create an IAM policy and attach it to the IAM user or role to upload the required files to your Atlan bucket. To create an IAM policy with the necessary permissions, follow the steps below
Google Cloud Storage
To use Atlan's Google Cloud Storage (GCS) bucket, first you have to create a new service account. Then Raise a support request to share the username of the service account with Atlan. The username is in the following format: [email protected]
. The Atlan support team provides you with read and write access to a particular folder in the Atlan GCS bucket. Once Atlan has granted access, you can use the service account to upload the required files.
Structure the bucket
Once you have configured access, the next step is to organize your storage bucket so that Atlan can correctly identify and process uploaded files.
Atlan uses the metadata.invocation_id
and metadata.project_id
attributes to uniquely identify and link the uploaded files. Atlan doesn't use the file paths to identify a project or job that the file belongs to. The following directory structure is provided as a guideline
Atlan supports extracting dbt metadata from multiple or single dbt projects. The main-prefix
has the following format gcs|s3://<BUCKET_NAME>/<PATH_PREFIX>
or abfss://<CONTAINER>/<PATH>
, if you used Atlan's bucket, the Atlan support team provides it after setting up access policies on your bucket.
You need to use the following directory structure, even if you have a single dbt project:
main-prefix
- project1
- job1
- manifest.json
- other files
- job2
- manifest.json
- other files
- job4
- manifest.json
- other files
- project3
- job5
- manifest.json
- other files
Upload project files
To load correct metadata, Atlan processes the manifest.json and run_results.json files for each job. There are many ways to load the metadata, below are suggested approaches from Atlan. You need to upload the files from the target directory of the dbt project into distinct folders. Upload the run artifacts generated from the following commands:
- (Required) Compilation results:
dbt compile --full-refresh
This command generates files that contain a full representation of your dbt project's resources, including models, tests, macros, node configurations, resource properties, and more.
Files to upload: manifest.json
and run_results.json
Alternatively, you can upload the same files by running the dbt run --full-refresh
command.
- (Optional) Test results:
dbt test
This command executes all dbt tests in a dbt project and generates files that contain the test results.
Files to upload: manifest.json
and run_results.json
- (Optional) Catalog:
dbt docs generate
This command generates metadata about the tables and views produced by the models in your dbt project, for example, column data types and table statistics.
Files to upload: manifest.json
and catalog.json