Skip to main content

Set up dbt Core

Option 1: Use the Atlan bucket

To avoid access issues, we recommend uploading the required files to the same bucket as Atlan. If you instead opt to use your own bucket, you will need to complete the steps outlined here.

Amazon S3

Raise a support request to get the details of your Atlan S3 bucket and include the ARN value of the IAM user or IAM role we can provision access to.

Create IAM policy

You will need to create an IAM policy and attach it to the IAM user or role to upload the required files to your Atlan bucket. To create an IAM policy with the necessary permissions, follow the steps in the AWS Identity and Access Management User Guide.

Create the policy using the following JSON:

{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::<bucket_name>/*"
],
"Effect": "Allow"
},
{
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::<bucket_name>"
],
"Effect": "Allow"
}
]
}
  • Replace <bucket_name> with the name of your Atlan bucket.

Google Cloud Storage

To use Atlan's Google Cloud Storage (GCS) bucket, complete the following steps.

Create a new service account

To create a new service account:

  1. Open the Google Cloud console.
  2. From the left menu under IAM and admin, click Service accounts.
  3. Select a Google Cloud project.
  4. From the upper left of the Service accounts page, click Create Service Account.
  5. For Service account details, enter the following details:
    1. For Service account name, enter a service account name to display in the Google Cloud console.
    2. For Service account ID, the Google Cloud console generates a service account ID based on this name. Edit the ID if necessary - the ID cannot be changed later.
    3. (Optional) For Service account description, enter a description for the service account.
    4. Click Create and continue.

Notify Atlan support

Raise a support request to share the username of the service account with Atlan. The username will be in the following format - [email protected].

The Atlan support team will provide you with read and write access to a particular folder in the Atlan GCS bucket. Once Atlan has granted access, you can use the service account to upload the required files.

Option 2: Use your own bucket

Amazon S3

danger

S3 buckets with VPC endpoints currently do not support cross-region requests. This may result in workflows not picking up objects from your bucket. Atlan also recommends disabling ACLs on your S3 bucket when using this method. Having ACLs enabled may prevent the bucket owner from accessing the stored objects.

You'll first need to create a cross-account bucket policy giving Atlan's IAM role access to your bucket. A cross-account bucket policy is required since your Atlan tenant and S3 bucket may not always be deployed in the same AWS account. The permissions required for the S3 bucket include - GetBucketLocation, ListBucket, and GetObject.

To create a cross-account bucket policy:

  1. Raise a support ticket to get the ARN of the Node Instance Role for your Atlan EKS cluster.

  2. Create a new policy to allow access by this ARN and update your bucket policy with the following:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "VisualEditor0",
    "Effect": "Allow",
    "Principal": {
    "AWS": "<role-arn>"
    },
    "Action": [
    "s3:GetBucketLocation",
    "s3:ListBucket",
    "s3:GetObject"
    ],
    "Resource": [
    "arn:aws:s3:::<bucket-name>",
    "arn:aws:s3:::<bucket-name>/<prefix>/*"
    ]
    }
    ]
    }
    • Replace <role-arn> with the role ARN of Atlan's node instance role.
    • Replace <bucket-name> with the name of the bucket you are creating.
    • Replace <prefix> with the name of the prefix (directory) within that bucket where you will upload the files.
  3. Once the new policy has been set up, please notify the support team. Your request should include the S3 bucket name and prefix. This should be done prior to setting up the workflow so that we can create and attach an IAM policy for your bucket to Atlan's IAM role.

(Optional) Update KMS policy

If your S3 bucket is encrypted, you will need to update your KMS policy. This will allow Atlan to decrypt the objects in your S3 bucket.

  1. Provide the KMS key ARN and KMS key alias ARN to the Atlan support team. The KMS key that you provide must be a customer managed KMS key. (This is because you can only change the key policy for a customer managed KMS key, and not for an AWS managed KMS key. Refer to AWS documentation to learn more.)

  2. To whitelist the ARN of Atlan's node instance, update the KMS policy with the following:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "DecryptCrossAccount",
    "Effect": "Allow",
    "Principal": {
    "AWS": "<role-arn>"
    },
    "Action": [
    "kms:Decrypt",
    "kms:DescribeKey"
    ],
    "Resource": "*"
    }
    ]
    }
  • Replace <role-arn> with the role ARN of Atlan's node instance role.

Google Cloud Storage

To use your own GCS bucket, complete the following steps.

Create a new bucket

To create a new bucket:

  1. In the Google Cloud console, go to the Cloud Storage page.
  2. From the left menu of the Cloud Storage page, click Buckets.
  3. From the top header of the Buckets page, click the Create button.
  4. On the Create a bucket page, enter the following details:
    1. In the Get Started section, enter a globally unique name that follows the bucket naming conventions.
    2. In the Choose where to store your data section, for Location type, select the relevant option.
    3. In the Choose a storage class for your data section, select the relevant option.
  5. At the bottom of the form, click the Create button.
  6. From the Buckets page, click the name of the bucket to which you want to upload files.
  7. In the Objects tab for the bucket, click Upload files to upload the required files. Ensure that the files are either stored in the root directory of your bucket or inside a structured folder.

Request Atlan's details

Raise a support request for Atlan to assign a service account username for your tenant. Atlan will provide you with the username of the service account in the following format - [email protected].

Create a custom role

You will need to create a custom role in the Google Cloud console to allow Atlan's service account to list and read objects in your bucket.

To create a custom role:

  1. Open the Google Cloud console.
  2. From the left menu under IAM and admin, click Roles.
  3. Using the dropdown list at the top of the page, select the project in which you want to create a role.
  4. From the upper left of the Roles page, click Create Role.
  5. In the Create role page, enter the following details:
    1. For Title, enter a meaningful name for the custom role - for example, AtlanStorageAccessRole.
    2. (Optional) For Description, enter a description for the custom role.
    3. For ID, the Google Cloud console generates a custom role ID based on the custom role name. Edit the ID if necessary - the ID cannot be changed later.
    4. (Optional) For Role launch stage, assign a stage for the custom role - for example, Alpha, Beta, or General Availability.
    5. Click Add permissions to select the permissions you want to include in the custom role. In the Add permissions dialog, click the Enter property name or value filter and add the following permissions:
      • storage.objects.list allows Atlan to list objects in your bucket.
      • storage.objects.get allows Atlan to retrieve objects from your bucket.
    6. Click Create to complete the custom role setup.

Assign custom role to Atlan service account

To assign the custom role you created to the Atlan service account:

  1. In the Google Cloud console, go to the Cloud Storage page.
  2. From the left menu of the Cloud Storage page, click Buckets.
  3. From the top header of the Buckets page, click the name of the bucket to which you want to assign the custom role.
  4. From the top of the page, click the Permissions tab.
  5. In the Permissions page, click the Grant access button.
  6. In the Add principals dialog, configure the following:
    1. In the New principals field, enter the Atlan service account username from Atlan to provide access to your bucket.
    2. For Select a role, click the dropdown to select the custom role you created for the Atlan service account.
    3. Click Save.

Notify Atlan support

Raise a support request to provide the following details:

  • The name of your GCS bucket.
  • Confirmation that the Atlan service account has been granted access to your bucket.

Structure the bucket

Atlan uses the metadata.invocation_id and metadata.project_id attributes to uniquely identify and link the uploaded files. Atlan does not use the file paths to identify a project or job that the file belongs to. The following directory structure is provided as a guideline:

Multiple projects

Atlan supports extracting dbt metadata from multiple dbt projects. The main-prefix has the following format s3://<BUCKET_NAME>/<PATH_PREFIX>, and Atlan support team will provide it to you after setting up access policies on your bucket.

You will need to use the following directory structure:

main-prefix

  • project1
    • job1
      • manifest.json
      • other files
    • job2
      • manifest.json
      • other files
  • project2
    • job3
      • manifest.json
      • other files
    • job4
      • manifest.json
      • other files
  • project3
    • job5
      • manifest.json
      • other files

Single project

Even if you have a single dbt project, Atlan recommends that you follow the directory structure above.

Upload project files

To load correct metadata, Atlan processes the manifest.json and run_results.json files for each job. There are many ways to load the metadata, below are suggested approaches from Atlan.

You will need to upload the files from the target directory of the dbt project into distinct folders.

Upload the run artifacts generated from the following commands:

  • (Required) Compilation results:

    dbt compile --full-refresh
    • This command will generate files that contain a full representation of your dbt project's resources, including models, tests, macros, node configurations, resource properties, and more.
    • Files to upload - manifest.json and run_results.json
    • Alternatively, you can upload the same files by running the dbt run --full-refresh command.
  • (Optional) Test results:

    dbt test
    • This command will execute all dbt tests in a dbt project and generate files that contain the test results.
    • Files to upload - manifest.json and run_results.json
  • (Optional) Catalog:

    dbt docs generate
    • This command will generate metadata about the tables and views produced by the models in your dbt project - for example, column data types and table statistics.
    • Files to upload - manifest.json and catalog.json