Set up Databricks Private preview
This guide walks through configuring Databricks to work with Atlan's data quality studio by creating the required service principal, setting up authentication, and granting the necessary privileges. Atlan recommends using serverless SQL warehouses for instant compute availability.
System requirements
Before setting up the integration, make sure you meet the following requirements:
- Databricks Premium or Enterprise edition
- Serverless Compute for Jobs & Notebooks enabled
- Dedicated SQL warehouse for running DQ-related queries
- Outbound network access permitted from Serverless Compute (Enterprise tier only)
Prerequisites
Before you begin, complete the following steps:
- Obtain Workspace admin and Metastore Admin or CREATE CATALOG privilege
- Identify your dedicated SQL warehouse for DQ operations
- Create an API token in Atlan that's stored in Databricks for authentication
- Review Data Quality permissions to understand required privileges
Create service principal
Create the service principal that Atlan uses to perform Data Quality (DQ) operations within your Databricks workspace.
-
Follow the appropriate guide based on your Databricks deployment environment:
-
Store the following credentials securely:
client_id
client_secret
tenant_id
(Azure only)- Service principal name
Atlan recommends naming it:
atlan-dq-service-principal
-
Set up authentication: Choose one of the following authentication methods for your service principal:
OAuth (Recommended):
- Use the
client_id
,client_secret
, andtenant_id
(Azure only) from the service principal created in the previous step - No additional configuration required
Personal access token (pat):
- Follow the Databricks Personal Access Token guide to generate a token for the service principal
- Store the token securely for use in the next steps
- Use the
-
Grant warehouse access: Grant the service principal access to a SQL warehouse that's used to run Data Quality queries.
- Go to your Databricks workspace UI
- Navigate to SQL > SQL Warehouses
- Click on the warehouse you want Atlan to use
- Click on the Permissions button
- Select the Service Principal (
atlan-dq-service-principal
) from the list - Assign the Can Use permission
- Click Add
Once access is granted, Atlan can use this warehouse to run SQL queries related to Data Quality operations.
Set up Databricks objects
Create the required Databricks objects needed for the functioning of the Atlan Data Quality Studio.
Create the atlan_dq catalog
The atlan_dq
catalog is used by Atlan to store metadata, DQ rule execution results, and internal processing tables.
Run the following SQL command in a Databricks notebook or SQL editor:
CREATE CATALOG IF NOT EXISTS atlan_dq;
Set up secret scope and secret
Create a Databricks Secret Scope to securely store the Atlan API token. This token enables the service principal to authenticate and interact with Atlan's APIs.
Secret scopes and secret ACLs can only be managed using the Databricks CLI or REST API. These operations aren't supported through SQL.
-
Create a new Secret Scope named
atlan_dq
:databricks secrets create-scope atlan_dq
-
Save the Atlan API token in a secret named
api_token
in the scope:databricks secrets put-secret --json '{
"scope": "atlan_dq",
"key": "api_token",
"string_value": "<ATLAN_API_TOKEN>"
}'Replace
<ATLAN_API_TOKEN>
with the API token value you created in Atlan.
Grant privileges
Grant the following privileges to atlan-dq-service-principal so it can create internal objects, read the Atlan API token, and query data for quality checks. Replace placeholders with real values.
-
Manage the
atlan_dq
catalogGRANT USE CATALOG ON CATALOG atlan_dq TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT CREATE SCHEMA ON CATALOG atlan_dq TO '<SERVICE_PRINCIPAL_CLIENT_ID>'; -
Read the API token stored in the
atlan_dq
secret scopedatabricks secrets put-acl atlan_dq <SERVICE_PRINCIPAL_CLIENT_ID> READ
-
Access data for quality checks (choose one scope)
Catalog level
GRANT USE CATALOG ON CATALOG <CATALOG> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT USE SCHEMA ON CATALOG <CATALOG> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT SELECT ON CATALOG <CATALOG> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';Schema level
GRANT USE CATALOG ON CATALOG <CATALOG> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT USE SCHEMA ON SCHEMA <SCHEMA> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT SELECT ON SCHEMA <SCHEMA> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';Table level
GRANT USE CATALOG ON CATALOG <CATALOG> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT USE SCHEMA ON SCHEMA <SCHEMA> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
GRANT SELECT ON TABLE <TABLE> TO '<SERVICE_PRINCIPAL_CLIENT_ID>';
These grants let Atlan create its internal schemas, fetch the API token securely, and run SELECT queries needed for rule execution.
Next steps
- Enable data quality on connection - Configure your Databricks connection for data quality monitoring
Need help
If you have questions or need assistance with setting up Databricks for data quality, reach out to Atlan Support by submitting a support request.
See also
- Configure alerts for data quality rules - Set up real-time notifications for rule failures