Skip to main content

Setup and configuration

This document answers common questions about prerequisites, permissions, and environment settings required to run Atlan’s Data Quality Studio on Databricks.

What Databricks edition is required for data quality?

Atlan DQ support for Databricks is supported only on Premium and Enterprise tiers of Databricks.

What administrative access is required?

The user performing the setup must be:

  • A Workspace admin; and
  • A Metastore Admin or have CREATE CATALOG privilege on the metastore linked to the workspace

Is serverless compute required?

Yes, your workspace must have the following feature enabled:

  • Serverless Compute for Jobs & Notebooks

This is required to permit execution of Atlan's DQ jobs in your Databricks Workspace using Serverless compute.

A dedicated SQL warehouse must be identified for running DQ-related queries. While Atlan supports any SQL Warehouse, Atlan recommends using a Serverless SQL Warehouse for faster startup times.

Is network access configuration required?

Outbound Network Access Must Be Allowed from Serverless Compute: Databricks Serverless Compute uses network policies to control outbound traffic [only for Enterprise tier]. Verify that outbound connectivity to Atlan is permitted from the Serverless environment.

What Atlan prerequisites are needed?

Before integrating with Databricks, you need to generate an API token in Atlan. This token is securely stored in Databricks in a secret and used to authenticate API requests from within Databricks.

How long does the setup take?

After completing the setup steps, Atlan takes approximately 10 minutes to complete the setup in the background. Once finished, you'll see data quality options available on your Databricks assets.

Can I use private channels for alerts?

Only public channels are supported for data quality alerts. Alerts can't be routed to private channels or Direct Messages at this time.

Is there a limit on monitored tables in Databricks?

Yes, Atlan currently supports a maximum of 12,000 monitored tables for Databricks.

What column types are supported for the "Run on most recent day of data" feature?

The Run on most recent day of data feature requires a timestamp column to determine which rows are recent. For Databricks, supported column types are:

  • DATE
  • TIMESTAMP_NTZ
  • TIMESTAMP

Columns that don't match these data types never appear in the selector, which prevents accidental configuration errors. The timestamp selection is shared at the table level, so updating it for one rule automatically applies to other rules on the same table that use this feature.

Which rule types don't support the "Run on most recent day of data" feature?

The Run on most recent day of data option is unavailable for:

  • Freshness rules that manage their own recency logic
  • Custom SQL rules where user-authored SQL controls filtering

These rule types implement their own time-based filtering logic that conflicts with the automatic filter injection.