Skip to main content

Lineage root cause analysis

Use Lakehouse to trace data quality issues in reports, dashboards, or downstream applications back through the entire data pipeline to find where the problem originated.

These queries use the ASSETS table from the GOLD namespace in the Lakehouse catalog and the LINEAGE table, which you create separately in your warehouse (see Set up lineage tables).

Before you begin

Before you run these queries, make sure:

  • Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
  • The LINEAGE table is set up in your warehouse. This table is not part of the native GOLD namespace and must be created separately. See Set up lineage tables.

You can run these queries directly on your lineage tables to systematically trace upstream from a problematic asset and identify likely root causes of data issues.

Trace upstream to dependencies

Start from a problematic asset such as a dashboard, report, or table and walk upstream to identify all contributing assets and systems. Use this when you see unexpected values or quality check failures downstream.

-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- gold.assets: Lakehouse catalog (GOLD namespace)
-- LINEAGE: Customer-managed table (see "Set up lineage tables")
-- ============================================

SELECT
a.asset_name AS PROBLEM_ASSET, -- Name of the asset with issues
a.asset_type AS PROBLEM_ASSET_TYPE, -- Type of problematic asset (Dashboard, Table, etc.)
a.asset_qualified_name AS PROBLEM_ASSET_PATH, -- Full path to problematic asset
a.owner_users AS PROBLEM_ASSET_OWNERS, -- Who owns the problematic asset

l.related_name AS UPSTREAM_ASSET, -- Name of upstream dependency
l.related_type AS UPSTREAM_ASSET_TYPE, -- Type of upstream asset
l.level AS HOPS_FROM_PROBLEM, -- How many steps back from the problem (1=direct source)

-- Additional metadata about the upstream asset (join back to ASSETS table)
ra.asset_qualified_name AS UPSTREAM_QUALIFIED_NAME, -- Full path to upstream asset
ra.owner_users AS UPSTREAM_OWNERS, -- Who owns the upstream asset
ra.certificate_status AS UPSTREAM_CERT_STATUS, -- Certification status of upstream asset
ra.connector_name AS UPSTREAM_CONNECTOR -- Source system connector (Snowflake, Fivetran, etc.)

FROM gold.assets a
INNER JOIN LINEAGE l
ON a.guid = l.start_guid -- Join on asset identifier
LEFT JOIN gold.assets ra
ON l.related_guid = ra.guid -- Get metadata for upstream assets

WHERE
-- START HERE: Replace with your problematic asset guid or qualified name
a.guid = 'ff0f00d5-dde5-4cf9-b199-2ab09a961bd2'

-- Trace UPSTREAM to find the source of the issue
AND l.direction = 'UPSTREAM'

-- Optional: Limit depth to avoid excessive recursion (default: 5 hops)
AND l.level <= 5

-- Optional: Only look at data assets, exclude process nodes
AND l.related_type IN ('Table', 'View', 'MaterializedView', 'Column')

ORDER BY
l.level ASC, -- Show closest sources first (1-hop, 2-hop, etc.)
l.related_name; -- Alphabetically within each level
Sample output
PROBLEM_ASSETPROBLEM_ASSET_TYPEUPSTREAM_ASSETUPSTREAM_ASSET_TYPEHOPS_FROM_PROBLEMUPSTREAM_CERT_STATUSUPSTREAM_CONNECTOR
FCT_SALESTableDIM_CITIESTable1VERIFIEDsnowflake
FCT_SALESTableDIM_CUSTOMERSTable1VERIFIEDsnowflake
FCT_SALESTableDIM_STOCK_ITEMSTable1VERIFIEDsnowflake
FCT_SALESTableSTG_DELIVERY_METHODSTable1VERIFIEDsnowflake
FCT_SALESTableSTG_INVOICESTable1VERIFIEDsnowflake
FCT_SALESTableSTG_INVOICE_LINESTable1VERIFIEDsnowflake
FCT_SALESTableSTG_PACKAGE_TYPESTable1VERIFIEDsnowflake
FCT_SALESTableSTG_PEOPLETable1VERIFIEDsnowflake
FCT_SALESTableDELIVERYMETHODSTable2VERIFIEDsnowflake
FCT_SALESTableDIM_CITIESTable2VERIFIEDsnowflake