๐Ÿ“œ Our Manifesto
๐Ÿงฐ Backup & Disaster Recovery
๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Customer Success & Supporty
๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Community

dbt

Steps to integrate your dbt models with Atlan

Atlan supports integration with dbt, which allows you to integrate your dbt models with your Atlan workspace.

๐Ÿ’ญ TL;DR

You can set up a dbt integration with your Atlan workspace in five easy steps:

  1. Store dbt artifact files structured in the S3 Bucket โ˜๏ธ

  2. Select the source in Atlan, aka dbt ๐Ÿ˜‰

  3. Provide your credentials โœ๏ธ

  4. Set up your configuration ๐Ÿ—„๏ธ

  5. Schedule automatic updates ๐Ÿ•‘

๐Ÿ“œ Prerequisites for dbt integration

Before you get started, you'll need some information to help establish a connection between Atlan and the S3 bucket containing dbt artifact files i.e manifest.json, catalog.json.

*Note:

  1. You need to provide S3 Credential only if the bucket does not have the same IAM role as Atlan Cluster bucket.

  2. Atlan crawler has to be in sync with artifact files stored in the S3 bucket. It is recommended to keep the updated files in S3 bucket before starting the crawler.

  3. GitHub URL will allow you to view the SQL code of the model in Git via Atlan.

๐Ÿ”‘ S3 bucket permissions

The S3 bucket should have the minimum required permissions:

  • ListObjects

  • GetBucketLocation

  • GetObject

๐Ÿ“ S3 folder structure

The artifact files have to be kept in a structure according to the respective projects. An example of the S3 folder structure is below. Other files such as run_results.json and sources.json can also be stored in the below defined structure.

dbt_dump_folder
|
โ”œโ”€โ”€ project1
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ project2
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ project3
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json

๐Ÿš€ The step-by-step guide to integrate dbt with Atlan

Once you have the prerequisite information listed above, please follow the steps below ๐Ÿ‘‡ to establish a connection and integrate Atlan with dbt.

STEP 1: Store dbt artifact files structured in the S3 Bucket

  1. Once you orchestrate or run jobs to create dbt models in your warehouse, dbt will generate artifact files i.e manifest.json, catalog.json, run_results.json, and sources.json.

  2. Store these artifact files in S3 bucket as per the structure recommended in the above section.

STEP 2: Select the source

  1. Log into your Atlan workspace.

  2. On the Home Screen, click on the "New Integration" button in the top right corner. You will see a dialogue box with the list of sources available in your workspace.

  3. Select "DBT" from the list of options, and click on "Next".

STEP 3: Provide credentials

  1. You will see an option to either select a preconfigured credential from the drop-down menu or to create a credential. To set up a new connection, click on the "Create Credential" button.

  2. You will be required to fill in your S3 Bucket Credentials and location where your dbt artifact files are stored.

  3. Once you have filled in the details, click on "Next".

STEP 4: Set up your configuration

  1. Add a Crawler name. This is the unique name you can setup as reference to the config

  2. Configure your column-level lineage.

STEP 5: Schedule automatic updates

  1. Choose whether to run the crawler once or schedule it for a daily, weekly, or monthly run. You will be asked to specify the timezone for the run

  2. It is recommended to keep the updated files in the S3 bucket before the scheduled time. Atlan crawler has to be in sync with the artifact files stored in the S3 bucket.

  3. Click on "Create and Run". Your connection is now created.

Congratulations! You have now integrated Atlan with dbt ๐ŸŽ‰

๐Ÿ™‹ FAQ

A) What metadata are we enriching from dbt?

  1. dbt Business Metadata

    1. Model Name

    2. Description

    3. Tags

    4. Owner

    5. Package

    6. Url

    7. Materialized

    8. Unique ID

    9. Relation

    10. Resource Type

    11. DBT Connection

  2. dbt lineage

    1. Table Level Lineage

    2. Column Level Lineage

B) Do we need to add new artifact files in a different S3 folder every time it is generated?

No, you don't need to add new files in a different folder. Storing the new files to the existing folder will rewrite the existing artifact files.

C) I have different environments in my dbt project, how should I structure my S3 Bucket in this case?

If you are working in multiple environment, below structure is recommended:

dbt_dump_folder
|
โ”œโ”€โ”€ environment1_project1
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ project1
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ project2
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ project3
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json
โ”œโ”€โ”€ environment2_project3
โ”‚ โ”œโ”€โ”€ catalog.json
โ”‚ โ””โ”€โ”€ manifest.json