Steps to integrate your dbt models with Atlan
Atlan supports integration with dbt, which allows you to integrate your dbt models with your Atlan workspace.
You can set up a dbt integration with your Atlan workspace in five easy steps:
Store dbt artifact files structured in the S3 Bucket ☁️
Select the source in Atlan, aka dbt 😉
Provide your credentials ✍️
Set up your configuration 🗄️
Schedule automatic updates 🕑
Before you get started, you'll need some information to help establish a connection between Atlan and the S3 bucket containing dbt artifact files i.e manifest.json, catalog.json.
S3 Access Key ID* (Optional)
S3 Secret Access Key* (Optional)
GitHub URL* (Optional)
Source Warehouse Credential - Same credentials used in the dbt project
You need to provide S3 Credential only if the bucket does not have the same IAM role as Atlan Cluster bucket.
Atlan crawler has to be in sync with artifact files stored in the S3 bucket. It is recommended to keep the updated files in S3 bucket before starting the crawler.
GitHub URL will allow you to view the SQL code of the model in Git via Atlan.
The S3 bucket should have the minimum required permissions:
The artifact files have to be kept in a structure according to the respective projects. An example of the S3 folder structure is below. Other files such as run_results.json and sources.json can also be stored in the below defined structure.
dbt_dump_folder|├── project1│ ├── catalog.json│ └── manifest.json├── project2│ ├── catalog.json│ └── manifest.json├── project3│ ├── catalog.json│ └── manifest.json
Once you have the prerequisite information listed above, please follow the steps below 👇 to establish a connection and integrate Atlan with dbt.
Once you orchestrate or run jobs to create dbt models in your warehouse, dbt will generate artifact files i.e manifest.json, catalog.json, run_results.json, and sources.json.
Store these artifact files in S3 bucket as per the structure recommended in the above section.
Log into your Atlan workspace.
On the Home Screen, click on the "New Integration" button in the top right corner. You will see a dialogue box with the list of sources available in your workspace.
Select "DBT" from the list of options, and click on "Next".
You will see an option to either select a preconfigured credential from the drop-down menu or to create a credential. To set up a new connection, click on the "Create Credential" button.
You will be required to fill in your S3 Bucket Credentials and location where your dbt artifact files are stored.
Once you have filled in the details, click on "Next".
Add a Crawler name. This is the unique name you can setup as reference to the config
Configure your column-level lineage.
Choose whether to run the crawler once or schedule it for a daily, weekly, or monthly run. You will be asked to specify the timezone for the run
It is recommended to keep the updated files in the S3 bucket before the scheduled time. Atlan crawler has to be in sync with the artifact files stored in the S3 bucket.
Click on "Create and Run". Your connection is now created.
Congratulations! You have now integrated Atlan with dbt 🎉
dbt Business Metadata
Table Level Lineage
Column Level Lineage
No, you don't need to add new files in a different folder. Storing the new files to the existing folder will rewrite the existing artifact files.
If you are working in multiple environment, below structure is recommended:
dbt_dump_folder|├── environment1_project1│ ├── catalog.json│ └── manifest.json├── project1│ ├── catalog.json│ └── manifest.json├── project2│ ├── catalog.json│ └── manifest.json├── project3│ ├── catalog.json│ └── manifest.json├── environment2_project3│ ├── catalog.json│ └── manifest.json