Skip to main content

Lineage root cause analysis

Use Lakehouse to trace data quality issues in reports, dashboards, or downstream applications back through the entire data pipeline to find where the problem originated.

This page provides SQL templates and guidance for root cause analysis using the GOLD.ASSET_LOOKUP_TABLE and GOLD.FULL_LINEAGE tables in your Lakehouse.

Before you begin

Before you run these queries, make sure:

You can run these queries directly on your lineage tables to systematically trace upstream from a problematic asset and identify likely root causes of data issues.

Trace upstream to dependencies

Start from a problematic asset such as a dashboard, report, or table and walk upstream to identify all contributing assets and systems. Use this when you see unexpected values or quality check failures downstream.

USE ATLAN_GOLD;

-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- ============================================

SELECT
A.ASSET_NAME AS PROBLEM_ASSET, -- Name of the asset with issues
A.ASSET_TYPE AS PROBLEM_ASSET_TYPE, -- Type of problematic asset (Dashboard, Table, etc.)
A.ASSET_QUALIFIED_NAME AS PROBLEM_ASSET_PATH, -- Full path to problematic asset
A.OWNER_USERS AS PROBLEM_ASSET_OWNERS, -- Who owns the problematic asset

L.RELATED_NAME AS UPSTREAM_ASSET, -- Name of upstream dependency
L.RELATED_TYPE AS UPSTREAM_ASSET_TYPE, -- Type of upstream asset
L.LEVEL AS HOPS_FROM_PROBLEM, -- How many steps back from the problem (1=direct source)

-- Additional metadata about the upstream asset (join back to ASSETS table)
RA.ASSET_QUALIFIED_NAME AS UPSTREAM_QUALIFIED_NAME, -- Full path to upstream asset
RA.OWNER_USERS AS UPSTREAM_OWNERS, -- Who owns the upstream asset
RA.CERTIFICATE_STATUS AS UPSTREAM_CERT_STATUS, -- Certification status of upstream asset
RA.CONNECTOR_NAME AS UPSTREAM_CONNECTOR -- Source system connector (Snowflake, Fivetran, etc.)

FROM ASSETS A
INNER JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Join on asset identifier
LEFT JOIN ASSETS RA
ON L.RELATED_GUID = RA.GUID -- Get metadata for upstream assets

WHERE
-- START HERE: Replace with your problematic asset GUID or qualified name
A.GUID = 'ff0f00d5-dde5-4cf9-b199-2ab09a961bd2'

-- Trace UPSTREAM to find the source of the issue
AND L.DIRECTION = 'UPSTREAM'

-- Optional: Limit depth to avoid excessive recursion (default: 5 hops)
AND L.LEVEL <= 5

-- Optional: Only look at data assets, exclude process nodes
AND L.RELATED_TYPE IN ('Table', 'View', 'MaterializedView', 'Column')

ORDER BY
L.LEVEL ASC, -- Show closest sources first (1-hop, 2-hop, etc.)
L.RELATED_NAME; -- Alphabetically within each level
Sample output
PROBLEM_ASSETPROBLEM_ASSET_TYPEUPSTREAM_ASSETUPSTREAM_ASSET_TYPEHOPS_FROM_PROBLEMUPSTREAM_CERT_STATUSUPSTREAM_CONNECTOR
FCT_SALESTableDIM_CITIESTable1VERIFIEDsnowflake
FCT_SALESTableDIM_CUSTOMERSTable1VERIFIEDsnowflake
FCT_SALESTableDIM_STOCK_ITEMSTable1VERIFIEDsnowflake
FCT_SALESTableSTG_DELIVERY_METHODSTable1VERIFIEDsnowflake
FCT_SALESTableSTG_INVOICESTable1VERIFIEDsnowflake
FCT_SALESTableSTG_INVOICE_LINESTable1VERIFIEDsnowflake
FCT_SALESTableSTG_PACKAGE_TYPESTable1VERIFIEDsnowflake
FCT_SALESTableSTG_PEOPLETable1VERIFIEDsnowflake
FCT_SALESTableDELIVERYMETHODSTable2VERIFIEDsnowflake
FCT_SALESTableDIM_CITIESTable2VERIFIEDsnowflake