Databricks miner app
The Databricks miner app mines lineage and query history from Databricks Unity
Catalog system tables to generate lineage and usage (popularity) metrics. Build it
with the DatabricksMiner builder.
A miner does not create a connection or take a credential—it runs on an
existing Databricks connection and reuses that connection's own credential, so
you only supply the connection's qualifiedName.
Source extraction
- Python
Mine lineage and query history from Databricks
from pyatlan.client.atlan import AtlanClient
from pyatlan.model.apps import DatabricksMiner
client = AtlanClient()
response = (
DatabricksMiner(client)
.connection( # (1)
qualified_name="default/databricks/1700000000",
)
.lineage_extraction_method("system-table") # (2)
.sql_warehouse_id("abc123def456") # (3)
.fetch_query_history_and_calculate_popularity(True) # (4)
.start_date(1704067200) # (5)
.popularity_window_days(30) # (6)
.run(name="databricks-prod-miner") # (7)
)
print(response.slug, response.run_id)
- Required. The exact
qualifiedNameof the existing Databricks connection to mine. Its credential is reused—no credential step is needed. - Optional.
system-tablereads lineage fromsystem.access.*;offlineskips extraction (when lineage is pre-computed upstream). - Optional. The SQL warehouse id used by the Statement Execution API for lineage extraction.
- Optional. Aggregate query-history counts into popularity scores (requires
system.access.query_history). - Optional. Fetch queries from this date onwards for query-history mining and popularity (doesn't affect lineage extraction).
- Optional. Lookback window in days for popularity (30 = last month).
- Always pass an explicit
namefor miners—a bare.run()defaults to the app id (databricks-miner) and a second run can collide.
Popularity from a cloned catalog
If you mine from a cloned catalog/schema rather than the system catalog, all of
these are optional:
- Python
Popularity / lineage from a cloned catalog
(
DatabricksMiner(client)
.connection(qualified_name="default/databricks/1700000000")
.extraction_catalog_type("cloned-catalog") # (1)
.cloned_catalog_name("my_clone") # (2)
.cloned_schema_name("access_clone") # (3)
.extraction_catalog_for_popularity("cloned-catalog") # (4)
.cloned_catalog_name_for_popularity("my_clone") # (5)
.cloned_schema_name_for_popularity("query_clone") # (6)
.set_sql_warehouse_popularity("abc123def456") # (7)
.excluded_users(["svc-account"]) # (8)
.enable_file_path_lineage(False) # (9)
.run(name="databricks-prod-miner")
)
- Catalog to use for lineage extraction (
systemby default, orcloned-catalog). - Name of the catalog containing the cloned schema for lineage.
- Name of the schema containing the cloned tables for lineage.
- Catalog to use for popularity extraction.
- Name of the catalog containing the cloned schema for popularity.
- Name of the schema containing the cloned tables for popularity.
- The SQL warehouse id used for popularity extraction.
- Users whose queries to exclude from usage metrics.
- Track lineage at the file-path level for volumes and external locations.