Lineage root cause analysis
Use Lakehouse to trace data quality issues in reports, dashboards, or downstream applications back through the entire data pipeline to find where the problem originated.
These queries use the ASSETS table from the GOLD namespace in the Lakehouse catalog and the LINEAGE table, which you create separately in your warehouse (see Set up lineage tables).
Before you begin
Before you run these queries, make sure:
- Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
- The
LINEAGEtable is set up in your warehouse. This table is not part of the native GOLD namespace and must be created separately. See Set up lineage tables.
You can run these queries directly on your lineage tables to systematically trace upstream from a problematic asset and identify likely root causes of data issues.
Trace upstream to dependencies
Start from a problematic asset such as a dashboard, report, or table and walk upstream to identify all contributing assets and systems. Use this when you see unexpected values or quality check failures downstream.
- Snowflake
- Databricks
- BigQuery
-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- gold.assets: Lakehouse catalog (GOLD namespace)
-- LINEAGE: Customer-managed table (see "Set up lineage tables")
-- ============================================
SELECT
a.asset_name AS PROBLEM_ASSET, -- Name of the asset with issues
a.asset_type AS PROBLEM_ASSET_TYPE, -- Type of problematic asset (Dashboard, Table, etc.)
a.asset_qualified_name AS PROBLEM_ASSET_PATH, -- Full path to problematic asset
a.owner_users AS PROBLEM_ASSET_OWNERS, -- Who owns the problematic asset
l.related_name AS UPSTREAM_ASSET, -- Name of upstream dependency
l.related_type AS UPSTREAM_ASSET_TYPE, -- Type of upstream asset
l.level AS HOPS_FROM_PROBLEM, -- How many steps back from the problem (1=direct source)
-- Additional metadata about the upstream asset (join back to ASSETS table)
ra.asset_qualified_name AS UPSTREAM_QUALIFIED_NAME, -- Full path to upstream asset
ra.owner_users AS UPSTREAM_OWNERS, -- Who owns the upstream asset
ra.certificate_status AS UPSTREAM_CERT_STATUS, -- Certification status of upstream asset
ra.connector_name AS UPSTREAM_CONNECTOR -- Source system connector (Snowflake, Fivetran, etc.)
FROM gold.assets a
INNER JOIN LINEAGE l
ON a.guid = l.start_guid -- Join on asset identifier
LEFT JOIN gold.assets ra
ON l.related_guid = ra.guid -- Get metadata for upstream assets
WHERE
-- START HERE: Replace with your problematic asset guid or qualified name
a.guid = 'ff0f00d5-dde5-4cf9-b199-2ab09a961bd2'
-- Trace UPSTREAM to find the source of the issue
AND l.direction = 'UPSTREAM'
-- Optional: Limit depth to avoid excessive recursion (default: 5 hops)
AND l.level <= 5
-- Optional: Only look at data assets, exclude process nodes
AND l.related_type IN ('Table', 'View', 'MaterializedView', 'Column')
ORDER BY
l.level ASC, -- Show closest sources first (1-hop, 2-hop, etc.)
l.related_name; -- Alphabetically within each level
-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- gold.assets: Lakehouse catalog (GOLD namespace)
-- LINEAGE: Customer-managed table (see "Set up lineage tables")
-- ============================================
SELECT
a.asset_name AS PROBLEM_ASSET, -- Name of the asset with issues
a.asset_type AS PROBLEM_ASSET_TYPE, -- Type of problematic asset (Dashboard, Table, etc.)
a.asset_qualified_name AS PROBLEM_ASSET_PATH, -- Full path to problematic asset
a.owner_users AS PROBLEM_ASSET_OWNERS, -- Who owns the problematic asset
l.related_name AS UPSTREAM_ASSET, -- Name of upstream dependency
l.related_type AS UPSTREAM_ASSET_TYPE, -- Type of upstream asset
l.level AS HOPS_FROM_PROBLEM, -- How many steps back from the problem (1=direct source)
-- Additional metadata about the upstream asset (join back to ASSETS table)
ra.asset_qualified_name AS UPSTREAM_QUALIFIED_NAME, -- Full path to upstream asset
ra.owner_users AS UPSTREAM_OWNERS, -- Who owns the upstream asset
ra.certificate_status AS UPSTREAM_CERT_STATUS, -- Certification status of upstream asset
ra.connector_name AS UPSTREAM_CONNECTOR -- Source system connector (Snowflake, Fivetran, etc.)
FROM gold.assets a
INNER JOIN LINEAGE l
ON a.guid = l.start_guid -- Join on asset identifier
LEFT JOIN gold.assets ra
ON l.related_guid = ra.guid -- Get metadata for upstream assets
WHERE
-- START HERE: Replace with your problematic asset guid or qualified name
a.guid = 'ff0f00d5-dse5-4cf9-b199-2ab09a961bd2'
-- Trace UPSTREAM to find the source of the issue
AND l.direction = 'UPSTREAM'
-- Optional: Limit depth to avoid excessive recursion (default: 5 hops)
AND l.level <= 5
-- Optional: Only look at data assets, exclude process nodes
AND l.related_type IN ('Table', 'View', 'MaterializedView', 'Column')
ORDER BY
l.level ASC, -- Show closest sources first (1-hop, 2-hop, etc.)
l.related_name; -- Alphabetically within each level
-- ============================================
-- ROOT CAUSE ANALYSIS: Trace Upstream to Find Source of Data Issues
-- gold.assets: Lakehouse catalog (GOLD namespace)
-- LINEAGE: Customer-managed table (see "Set up lineage tables")
-- ============================================
-- Add timestamp information to identify when assets were last updated
SELECT
a.asset_name AS PROBLEM_ASSET,
a.asset_type AS PROBLEM_ASSET_TYPE,
a.asset_qualified_name AS PROBLEM_ASSET_PATH,
a.owner_users AS PROBLEM_ASSET_OWNERS,
l.related_name AS UPSTREAM_ASSET,
l.related_type AS UPSTREAM_ASSET_TYPE,
l.level AS HOPS_FROM_PROBLEM,
ra.asset_qualified_name AS UPSTREAM_QUALIFIED_NAME,
ra.owner_users AS UPSTREAM_OWNERS,
ra.certificate_status AS UPSTREAM_CERT_STATUS,
ra.connector_name AS UPSTREAM_CONNECTOR,
-- Add timestamp information (convert from epoch milliseconds)
TIMESTAMP_MILLIS(ra.created_at) AS UPSTREAM_CREATED_DATE,
TIMESTAMP_MILLIS(ra.updated_at) AS UPSTREAM_UPDATED_DATE,
-- Calculate how recently the upstream asset was updated
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), TIMESTAMP_MILLIS(ra.updated_at), DAY) AS DAYS_SINCE_UPSTREAM_UPDATE
FROM gold.assets a
INNER JOIN LINEAGE l ON a.guid = l.start_guid
LEFT JOIN gold.assets ra ON l.related_guid = ra.guid
WHERE
a.guid = 'ff0f00d5-dde5-4cf9-b199-2ab09a961bd2'
AND l.direction = 'UPSTREAM'
AND l.level <= 5
AND l.related_type IN ('Table', 'View', 'MaterializedView', 'Column')
ORDER BY l.level ASC, l.related_name;
Sample output
| PROBLEM_ASSET | PROBLEM_ASSET_TYPE | UPSTREAM_ASSET | UPSTREAM_ASSET_TYPE | HOPS_FROM_PROBLEM | UPSTREAM_CERT_STATUS | UPSTREAM_CONNECTOR |
|---|---|---|---|---|---|---|
| FCT_SALES | Table | DIM_CITIES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | DIM_CUSTOMERS | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | DIM_STOCK_ITEMS | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_DELIVERY_METHODS | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_INVOICES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_INVOICE_LINES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_PACKAGE_TYPES | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | STG_PEOPLE | Table | 1 | VERIFIED | snowflake |
| FCT_SALES | Table | DELIVERYMETHODS | Table | 2 | VERIFIED | snowflake |
| FCT_SALES | Table | DIM_CITIES | Table | 2 | VERIFIED | snowflake |