Troubleshooting Databricks connectivity
The documentation refers to both SQL endpoint and interactive cluster as compute engine below.
Does Atlan consider expensive queries and compute costs?
No, Atlan doesn't factor in expensive queries or compute costs due to limitations in the Databricks APIs, which don't expose this information.
How does Atlan calculate popularity for Databricks assets?
Atlan calculates popularity for tables
, views
, and columns
in Databricks by analyzing query execution data. It retrieves query history from the system.query.history
table and specifically filters for execution_status = 'FINISHED'
and statement_type = 'SELECT'
to determine how frequently assets are accessed.
How to debug test authentication and preflight check errors?
Hostname resolution error
Provided Host name cannot be resolved via DNS, please check and try again.
- The hostname you have provided can't be resolved through DNS. Check that the hostname is correct.
- Verify that the DNS settings have been configured properly.
Invalid client ID or secret
Provided Client ID is invalid, please check and try again.
- The client ID or secret you have provided is either invalid or no longer working. Follow the steps for AWS or Azure setup to generate new credentials.
Invalid tenant ID
Provided tenant ID is invalid, please check and try again.
- The tenant ID you have provided is incorrect.
- Ensure that the tenant ID you have provided corresponds to the one in your Microsoft Entra ID application.
Unity Catalog not linked
Configured Databricks instance doesn't have Unity Catalog linked. Please choose JDBC extraction instead of REST API in Atlan.
- If you have not set up Unity Catalog in your Databricks workspace, you can change the extraction method to JDBC instead of REST API to crawl your Databricks assets in Atlan.
Connection timeout
Failed to connect to Databricks (connection timed out). Please check your host and port and try again.
- The connection to the Databricks instance has timed out.
- Verify that the host and port are correct.
- Check that no firewall rules or network issues are blocking the connection.
Invalid HTTP path
Provided HTTP path is invalid, please check and try again.
- The HTTP path you have provided is invalid.
- Ensure that the endpoint is properly configured and accessible, and the warehouse ID in the HTTP path is correct.
Invalid personal access token
PAT token is invalid, please check and try again.
- The personal access token used for authentication is invalid.
- Ensure that the token is valid and neither deleted nor expired.
- You can also generate a new personal access token, if needed.
Insufficient permisions for crawling metadata
User doesn't have access to any schemas / dbs, please check the accesses provided to the atlan user and try again.
- Check that the service principal or the user who's PAT token is being used has the necessary permissions provided. Refer to the setup doc to understand permissions required for different auth types.
Insufficient permisions for some of the included crawling metadata
Warning, user doesn't have access to the following objects anymore, or the objects no longer exist on the source!, check failed for ...
- user doesn't have access to one or more db objects from the include filter, (such as catalogs / schemas).
- You can either remove these objects from the include filter if they no longer exist on the source.
- Or check that the service principal or the user who's PAT token is being used has the necessary permissions provided. Refer to the setup doc to understand permissions required for different auth types.
Insufficient permisions to crawl tags
User doesn't have access to the following system tables
- Check that you have sufficient permissions provided for the tags extraction.
User doesn't have permission to access warehouses
please check your credentials and warehouse access
- Check that the configured user / service principal has
CAN_USE
on the configured SQL warehouse.
Unable to access query history from the source, user doesn't have the access
- Check the permissions required for the system tables based lineage extraction are provided.
System table extraction checks failing with
User doesn't have access to the following system tables
- Check the permissions required for the system tables based extraction.
General connection failure
Unable to connect to the configured Databricks instance, please check your credentials and configs and then try again. If the problem persists, contact [email protected]
.
- Check that you have entered the host and port correctly.
- Verify that the credentials for the connection are correct.
- Check that your Databricks instance is properly configured and available.
- If the problem still persists after verifying all of the previous steps, contact Atlan support.
Why does the workflow take longer than usual in the extraction step?
- Certain Databricks runtime versions don't have an easy way to extract some metadata (for example partitioning, table_type, and format). Extra operations must be performed to retrieve these, resulting in slower performance.
- If you aren't already, you may want to try the Unity Catalog extraction method.
Why is some metadata missing?
-
When using incremental extraction, consider running a one-time full extraction to capture any newly introduced metadata.
-
Currently, some metadata can't be extracted from Databricks:
Metadata JDBC REST API System Tables ViewCount
andTableCount
(on schemas)❌ ✅ ✅ RowCount
(on tables and views)❌ ❌ ❌ TABLE_KIND
(on tables and views)❌ ❌ ❌ PARTITION_STRATEGY
(on tables and views)❌ ❌ ❌ CONSTRAINT_TYPE
(on columns)❌ ✅ ✅ Partition key (on columns) ❌ ✅ ✅ Table partitioning information ✅ ❌ ✅ BYTES
,SIZEINBYTES
(table size)❌ ❌ ❌ -
The team is exploring ways to bring this metadata into Atlan if Databricks supports extraction of the metadata.
Why doesn't my SQL work when querying Databricks?
- Atlan currently supports SparkSQL on Databricks runtime 7.x and above.
Can I use Atlan when the Databricks compute engine isn't running?
- Atlan needs the Databricks compute engine to be running for two activities:
- Crawling assets (normal and scheduled run)
- Querying assets (including data previews)
- If you don't need to perform the activities listed, your experience shouldn't be affected.
- In any other case, you'll get a downgraded experience on Atlan if the compute engine isn't running. Queries won't work as expected and a scheduled workflow might fail after a couple of retries.
- The team recommends turning off the Terminate after x minutes of inactivity option in your cluster to avoid these problems. If you have this turned on, any of the listed activities triggers the cluster to come back online within about 30 seconds.
Why can't I see all the assets on Atlan that are available in Databricks?
- Have you excluded the database or schema when crawling?
- Does the Databricks user you configured for crawling have access to these other assets?
Why is the test authentication taking so long?
- Please check the state of the compute engine. It must be in a running state for all operations, including authentication.
What limitations are there with the REST API (Unity Catalog) extraction method?
- Currently, schema-level filtering and retrieving table partitioning information aren't supported.
Why has my workflow started to fail when it worked before?
- This can happen if the PAT you configured the workflow with has since expired.
- You will need to create a new PAT in Databricks, and then modify the workflow configuration in Atlan with this new PAT.
- If you are unable to update the PAT, pause the workflow and reach out to us.
How do I migrate to Unity Catalog?
- Currently Unity Catalog is in a public preview state.
- The Databricks team is working on an automated migration to Unity Catalog.
- Currently you must migrate individual tables manually.
Why are some notebooks missing from metadata extraction?
Notebooks stored inside hidden directories (names starting with "." such as .hidden_dir/
) are generally not returned by the /api/2.0/workspace/list
API endpoint. This may cause missing notebook details in Atlan.
Why is metadata missing for some Databricks entities?
The Databricks APIs used provide data only within a single configured workspace. If an entity used in lineage creation exists outside this workspace, its details won't be available via these APIs.