Lineage root cause analysis
Use Lakehouse to trace data quality issues in reports, dashboards, or downstream applications back through the entire data pipeline to find where the problem originated.
This page provides SQL templates and guidance for root cause analysis using the GOLD.ASSET_LOOKUP_TABLE and GOLD.FULL_LINEAGE tables in your Lakehouse.
Before you begin
Before you run these queries, make sure:
- Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
- The Gold layer is set up for your SQL engine. See Set up gold layer in Snowflake. For Databricks (Databricks setup guide) or BigQuery (BigQuery setup guide) setups, contact your Atlan Customer Success team.
You can run these queries directly on your lineage tables to systematically trace upstream from a problematic asset and identify likely root causes of data issues.
Trace upstream to dependencies
Start from a problematic asset such as a dashboard, report, or table and walk upstream to identify all contributing assets and systems. Use this when you see unexpected values or quality check failures downstream.
- Snowflake
- Databricks
- BigQuery
USE ATLAN_GOLD;
-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- ============================================
SELECT
A.ASSET_NAME AS PROBLEM_ASSET, -- Name of the asset with issues
A.ASSET_TYPE AS PROBLEM_ASSET_TYPE, -- Type of problematic asset (Dashboard, Table, etc.)
A.ASSET_QUALIFIED_NAME AS PROBLEM_ASSET_PATH, -- Full path to problematic asset
A.OWNER_USERS AS PROBLEM_ASSET_OWNERS, -- Who owns the problematic asset
L.RELATED_NAME AS UPSTREAM_ASSET, -- Name of upstream dependency
L.RELATED_TYPE AS UPSTREAM_ASSET_TYPE, -- Type of upstream asset
L.LEVEL AS HOPS_FROM_PROBLEM, -- How many steps back from the problem (1=direct source)
-- Additional metadata about the upstream asset (join back to ASSETS table)
RA.ASSET_QUALIFIED_NAME AS UPSTREAM_QUALIFIED_NAME, -- Full path to upstream asset
RA.OWNER_USERS AS UPSTREAM_OWNERS, -- Who owns the upstream asset
RA.CERTIFICATE_STATUS AS UPSTREAM_CERT_STATUS, -- Certification status of upstream asset
RA.CONNECTOR_NAME AS UPSTREAM_CONNECTOR -- Source system connector (Snowflake, Fivetran, etc.)
FROM ASSETS A
INNER JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Join on asset identifier
LEFT JOIN ASSETS RA
ON L.RELATED_GUID = RA.GUID -- Get metadata for upstream assets
WHERE
-- START HERE: Replace with your problematic asset GUID or qualified name
A.GUID = 'ff0f00d5-dde5-4cf9-b199-2ab09a961bd2'
-- Trace UPSTREAM to find the source of the issue
AND L.DIRECTION = 'UPSTREAM'
-- Optional: Limit depth to avoid excessive recursion (default: 5 hops)
AND L.LEVEL <= 5
-- Optional: Only look at data assets, exclude process nodes
AND L.RELATED_TYPE IN ('Table', 'View', 'MaterializedView', 'Column')
ORDER BY
L.LEVEL ASC, -- Show closest sources first (1-hop, 2-hop, etc.)
L.RELATED_NAME; -- Alphabetically within each level
-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- ============================================
USE atlan_gold;
SELECT
A.ASSET_NAME AS PROBLEM_ASSET, -- Name of the asset with issues
A.ASSET_TYPE AS PROBLEM_ASSET_TYPE, -- Type of problematic asset (Dashboard, Table, etc.)
A.ASSET_QUALIFIED_NAME AS PROBLEM_ASSET_PATH, -- Full path to problematic asset
A.OWNER_USERS AS PROBLEM_ASSET_OWNERS, -- Who owns the problematic asset
L.RELATED_NAME AS UPSTREAM_ASSET, -- Name of upstream dependency
L.RELATED_TYPE AS UPSTREAM_ASSET_TYPE, -- Type of upstream asset
L.LEVEL AS HOPS_FROM_PROBLEM, -- How many steps back from the problem (1=direct source)
-- Additional metadata about the upstream asset (join back to ASSETS table)
RA.ASSET_QUALIFIED_NAME AS UPSTREAM_QUALIFIED_NAME, -- Full path to upstream asset
RA.OWNER_USERS AS UPSTREAM_OWNERS, -- Who owns the upstream asset
RA.CERTIFICATE_STATUS AS UPSTREAM_CERT_STATUS, -- Certification status of upstream asset
RA.CONNECTOR_NAME AS UPSTREAM_CONNECTOR -- Source system connector (Snowflake, Fivetran, etc.)
FROM ASSETS A
INNER JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Join on asset identifier
LEFT JOIN ASSETS RA
ON L.RELATED_GUID = RA.GUID -- Get metadata for upstream assets
WHERE
-- START HERE: Replace with your problematic asset GUID or qualified name
A.GUID = 'ff0f00d5-dse5-4cf9-b199-2ab09a961bd2'
-- Trace UPSTREAM to find the source of the issue
AND L.DIRECTION = 'UPSTREAM'
-- Optional: Limit depth to avoid excessive recursion (default: 5 hops)
AND L.LEVEL <= 5
-- Optional: Only look at data assets, exclude process nodes
AND L.RELATED_TYPE IN ('Table', 'View', 'MaterializedView', 'Column')
ORDER BY
L.LEVEL ASC, -- Show closest sources first (1-hop, 2-hop, etc.)
L.RELATED_NAME; -- Alphabetically within each level
SET @@dataset_id = 'ATLAN_GOLD';
-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- ============================================
SET @@dataset_id = 'ATLAN_GOLD';
-- Add timestamp information to identify when assets were last updated
SELECT
A.ASSET_NAME AS PROBLEM_ASSET,
A.ASSET_TYPE AS PROBLEM_ASSET_TYPE,
A.ASSET_QUALIFIED_NAME AS PROBLEM_ASSET_PATH,
A.OWNER_USERS AS PROBLEM_ASSET_OWNERS,
L.RELATED_NAME AS UPSTREAM_ASSET,
L.RELATED_TYPE AS UPSTREAM_ASSET_TYPE,
L.LEVEL AS HOPS_FROM_PROBLEM,
RA.ASSET_QUALIFIED_NAME AS UPSTREAM_QUALIFIED_NAME,
RA.OWNER_USERS AS UPSTREAM_OWNERS,
RA.CERTIFICATE_STATUS AS UPSTREAM_CERT_STATUS,
RA.CONNECTOR_NAME AS UPSTREAM_CONNECTOR,
-- Add timestamp information (convert from epoch milliseconds)
TIMESTAMP_MILLIS(RA.CREATED_AT) AS UPSTREAM_CREATED_DATE,
TIMESTAMP_MILLIS(RA.UPDATED_AT) AS UPSTREAM_UPDATED_DATE,
-- Calculate how recently the upstream asset was updated
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), TIMESTAMP_MILLIS(RA.UPDATED_AT), DAY) AS DAYS_SINCE_UPSTREAM_UPDATE
FROM ASSETS A
INNER JOIN LINEAGE L ON A.GUID = L.START_GUID
LEFT JOIN ASSETS RA ON L.RELATED_GUID = RA.GUID
WHERE
A.GUID = 'ff0f00d5-dde5-4cf9-b199-2ab09a961bd2'
AND L.DIRECTION = 'UPSTREAM'
AND L.LEVEL <= 5
AND L.RELATED_TYPE IN ('Table', 'View', 'MaterializedView', 'Column')
ORDER BY L.LEVEL ASC, L.RELATED_NAME;
Sample output
| PROBLEM_ASSET | PROBLEM_ASSET_TYPE | UPSTREAM_ASSET | UPSTREAM_ASSET_TYPE | HOPS_FROM_PROBLEM | UPSTREAM_CERT_STATUS | UPSTREAM_CONNECTOR |
|---|---|---|---|---|---|---|
| FCT_SALES | Table | DIM_CITIES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | DIM_CUSTOMERS | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | DIM_STOCK_ITEMS | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_DELIVERY_METHODS | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_INVOICES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_INVOICE_LINES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_PACKAGE_TYPES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_PEOPLE | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | DELIVERYMETHODS | Table | 2 | VERIFIED | snowflake |
| FCT_SALES | Table | DIM_CITIES | Table | 2 | VERIFIED | snowflake |