Lineage analysis
This document provides SQL queries for lineage analysis scenarios in Atlan using Lakehouse (MDLH).
This page provides SQL templates and guidance for lineage analysis using the GOLD.ASSET_LOOKUP_TABLE and GOLD.FULL_LINEAGE tables in your Lakehouse.
Before you begin
Before you run these queries, make sure:
- Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
- The Gold layer is set up for your SQL engine. See Set up gold layer in Snowflake. For Databricks (Databricks setup guide) or BigQuery (BigQuery setup guide) setups, contact your Atlan Customer Success team.
You can use these SQL examples with Snowflake or another Iceberg REST–compatible engine connected to your Lakehouse.
Assets without lineage
Identify assets that are disconnected from data pipelines - potentially orphaned tables, test assets, or deprecated resources consuming storage and compute resources without providing value. These are prime candidates for cleanup, archival, or investigation.
This query identifies active data assets (tables, views, materialized views) that have no lineage connections and haven't been updated recently. It calculates how long assets have been dormant and provides cleanup recommendations based on inactivity period, certification status, and ownership. The query helps you prioritize which assets to investigate, archive, or delete to reduce storage costs and platform clutter. Time-based filters focus on truly orphaned assets versus recently created assets that simply haven't been connected yet.
- Snowflake
- Databricks
- BigQuery
-- ============================================
-- ASSETS WITHOUT LINEAGE: Find Orphaned/Unused Assets for Cleanup
-- Assets not having any lineage and have not been updated or used in the last 90 days
-- ============================================
SELECT
A.GUID,
A.ASSET_NAME, -- Asset name
A.ASSET_TYPE, -- Asset type (Table, View, etc.)
A.ASSET_QUALIFIED_NAME, -- Full path
A.CONNECTOR_NAME, -- Data platform
A.DESCRIPTION,
A.CERTIFICATE_STATUS,
A.STATUS,
A.OWNER_USERS,
A.TAGS, -- Tags attached to asset
-- Confirm no lineage
A.HAS_LINEAGE,
-- Metadata timestamps
TO_TIMESTAMP_LTZ(A.CREATED_AT / 1000) AS CREATED_DATE,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000) AS LAST_UPDATED,
-- Calculate age
DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.CREATED_AT / 1000),
CURRENT_TIMESTAMP()
) AS DAYS_SINCE_CREATION,
DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000),
CURRENT_TIMESTAMP()
) AS DAYS_SINCE_UPDATE,
-- Cleanup recommendation based on inactivity and metadata
CASE
WHEN A.CERTIFICATE_STATUS = 'DEPRECATED'
THEN '🔴 SAFE TO DELETE - Already deprecated'
WHEN DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000),
CURRENT_TIMESTAMP()
) > 180
AND A.STATUS = 'ACTIVE'
THEN '🟡 REVIEW FOR DELETION - No activity in 6+ months'
WHEN DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.CREATED_AT / 1000),
CURRENT_TIMESTAMP()
) <= 7
THEN '🟢 KEEP - Recently created, may not be connected yet'
WHEN (A.OWNER_USERS IS NULL)
THEN '🟡 INVESTIGATE - No owner, likely test/temp asset'
ELSE '🟢 REVIEW - May be intentionally standalone'
END AS CLEANUP_RECOMMENDATION
FROM ASSETS A
WHERE
-- No lineage connections
(
A.HAS_LINEAGE = FALSE
)
-- Only data assets
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView')
AND A.STATUS = 'ACTIVE'
-- Filter by connector (can specify multiple or single connector)
AND A.CONNECTOR_NAME IN ('snowflake', 'redshift', 'bigquery', 'databricks')
-- Only assets not updated in last 90 days (adjust as needed)
AND DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000),
CURRENT_TIMESTAMP()
) > 90
ORDER BY
DAYS_SINCE_UPDATE DESC, -- Oldest first
A.ASSET_NAME;
USE atlan_gold;
-- ============================================
-- ASSETS WITHOUT LINEAGE: Find Orphaned/Unused Assets for Cleanup
-- Assets not having any lineage and have not been updated or used in the last 90 days
-- ============================================
SELECT
A.GUID,
A.ASSET_NAME, -- Asset name
A.ASSET_TYPE, -- Asset type (Table, View, etc.)
A.ASSET_QUALIFIED_NAME, -- Full path
A.CONNECTOR_NAME, -- Data platform
A.DESCRIPTION,
A.CERTIFICATE_STATUS,
A.STATUS,
A.OWNER_USERS,
A.TAGS, -- Tags attached to asset
-- Confirm no lineage
A.HAS_LINEAGE,
-- Metadata timestamps
TO_TIMESTAMP_LTZ(A.CREATED_AT / 1000) AS CREATED_DATE,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000) AS LAST_UPDATED,
-- Calculate age
DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.CREATED_AT / 1000),
CURRENT_TIMESTAMP()
) AS DAYS_SINCE_CREATION,
DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000),
CURRENT_TIMESTAMP()
) AS DAYS_SINCE_UPDATE,
-- Cleanup recommendation based on inactivity and metadata
CASE
WHEN A.CERTIFICATE_STATUS = 'DEPRECATED'
THEN '🔴 SAFE TO DELETE - Already deprecated'
WHEN DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000),
CURRENT_TIMESTAMP()
) > 180
AND A.STATUS = 'ACTIVE'
THEN '🟡 REVIEW FOR DELETION - No activity in 6+ months'
WHEN DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.CREATED_AT / 1000),
CURRENT_TIMESTAMP()
) <= 7
THEN '🟢 KEEP - Recently created, may not be connected yet'
WHEN (A.OWNER_USERS IS NULL)
THEN '🟡 INVESTIGATE - No owner, likely test/temp asset'
ELSE '🟢 REVIEW - May be intentionally standalone'
END AS CLEANUP_RECOMMENDATION
FROM ASSETS A
WHERE
-- No lineage connections
(
A.HAS_LINEAGE = FALSE
)
-- Only data assets
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView')
AND A.STATUS = 'ACTIVE'
-- Filter by connector (can specify multiple or single connector)
AND A.CONNECTOR_NAME IN ('snowflake', 'redshift', 'bigquery', 'databricks')
-- Only assets not updated in last 90 days (adjust as needed)
AND DATEDIFF(
day,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000),
CURRENT_TIMESTAMP()
) > 90
ORDER BY
DAYS_SINCE_UPDATE DESC, -- Oldest first
A.ASSET_NAME;
SET @@dataset_id = 'ATLAN_GOLD';
-- ============================================
-- ASSETS WITHOUT LINEAGE: Find Orphaned/Unused Assets for Cleanup
-- Assets not having any lineage and have not been updated or used in the last 90 days
-- ============================================
SELECT
A.GUID,
A.ASSET_NAME, -- Asset name
A.ASSET_TYPE, -- Asset type (Table, View, etc.)
A.ASSET_QUALIFIED_NAME, -- Full path
A.CONNECTOR_NAME, -- Data platform
A.DESCRIPTION,
A.CERTIFICATE_STATUS,
A.STATUS,
A.OWNER_USERS,
A.TAGS, -- Tags attached to asset
-- Confirm no lineage
A.HAS_LINEAGE,
-- Metadata timestamps (convert from epoch milliseconds)
TIMESTAMP_MILLIS(A.CREATED_AT) AS CREATED_DATE,
TIMESTAMP_MILLIS(A.UPDATED_AT) AS LAST_UPDATED,
-- Calculate age (note: parameter order is reversed in BigQuery)
TIMESTAMP_DIFF(
CURRENT_TIMESTAMP(),
TIMESTAMP_MILLIS(A.CREATED_AT),
DAY
) AS DAYS_SINCE_CREATION,
TIMESTAMP_DIFF(
CURRENT_TIMESTAMP(),
TIMESTAMP_MILLIS(A.UPDATED_AT),
DAY
) AS DAYS_SINCE_UPDATE,
-- Cleanup recommendation based on inactivity and metadata
CASE
WHEN A.CERTIFICATE_STATUS = 'DEPRECATED'
THEN '🔴 SAFE TO DELETE - Already deprecated'
WHEN TIMESTAMP_DIFF(
CURRENT_TIMESTAMP(),
TIMESTAMP_MILLIS(A.UPDATED_AT),
DAY
) > 180
AND A.STATUS = 'ACTIVE'
THEN '🟡 REVIEW FOR DELETION - No activity in 6+ months'
WHEN TIMESTAMP_DIFF(
CURRENT_TIMESTAMP(),
TIMESTAMP_MILLIS(A.CREATED_AT),
DAY
) <= 7
THEN '🟢 KEEP - Recently created, may not be connected yet'
WHEN (A.OWNER_USERS IS NULL)
THEN '🟡 INVESTIGATE - No owner, likely test/temp asset'
ELSE '🟢 REVIEW - May be intentionally standalone'
END AS CLEANUP_RECOMMENDATION
FROM `ATLAN_GOLD.ASSETS` A -- Use fully qualified table name or just ATLAN_GOLD.ASSETS
WHERE
-- No lineage connections
A.HAS_LINEAGE = FALSE
-- Only data assets
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView')
AND A.STATUS = 'ACTIVE'
-- Filter by connector (can specify multiple or single connector)
AND A.CONNECTOR_NAME IN ('snowflake', 'redshift', 'bigquery', 'databricks')
-- Only assets not updated in last 30 days (adjust as needed)
AND TIMESTAMP_DIFF(
CURRENT_TIMESTAMP(),
TIMESTAMP_MILLIS(A.UPDATED_AT),
DAY
) > 30
ORDER BY
DAYS_SINCE_UPDATE DESC, -- Oldest first
A.ASSET_NAME;
Sample output
| ASSET_NAME | ASSET_TYPE | CONNECTOR_NAME | DAYS_SINCE_UPDATE | CLEANUP_RECOMMENDATION |
|---|---|---|---|---|
| ACCESSCONTROL_ENTITY | Table | snowflake | 71 | REVIEW - May be intentionally standalone |
| ADFACTIVITY_ENTITY | Table | snowflake | 71 | REVIEW - May be intentionally standalone |
| ADFDATAFLOW_ENTITY | Table | snowflake | 71 | REVIEW - May be intentionally standalone |
| ADFDATASET_ENTITY | Table | snowflake | 71 | REVIEW - May be intentionally standalone |
Circular dependencies
Detect assets with circular lineage where data flows back to itself through intermediate steps. This can indicate recursive queries (intentional), design issues (unintentional), or performance problems that need investigation.
This query identifies assets that reference themselves in their lineage graph, creating circular dependencies. This can occur with recursive CTEs, self-referencing views, or unintentional design issues. The query shows the direction of the circular reference (upstream or downstream) and helps you understand whether it's an intentional recursive pattern or a data modeling issue that needs correction.
- Snowflake
- Databricks
- BigQuery
USE ATLAN_GOLD;
-- ============================================
-- CIRCULAR DEPENDENCIES: Find Assets with Lineage to Themselves
-- Detects self-referencing patterns that may indicate recursive queries or design issues
-- ============================================
SELECT
A.GUID AS ASSET_GUID,
A.ASSET_NAME, -- Asset with circular reference
A.ASSET_TYPE, -- Type (Table, View, etc.)
A.ASSET_QUALIFIED_NAME AS ASSET_PATH, -- Full path
A.CONNECTOR_NAME,
A.OWNER_USERS,
A.CERTIFICATE_STATUS,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000) AS LAST_UPDATED,
-- Lineage relationship back to itself
L.DIRECTION AS LINEAGE_DIRECTION, -- UPSTREAM or DOWNSTREAM
L.LEVEL AS CIRCULAR_PATH_LENGTH, -- How many hops in the circle
L.RELATED_NAME AS RELATED_ASSET_NAME,
L.RELATED_TYPE AS RELATED_ASSET_TYPE,
-- Flag potential issues
CASE
WHEN L.LEVEL = 1 THEN '⚠️ DIRECT SELF-REFERENCE - Review immediately'
WHEN L.LEVEL <= 3 THEN '⚠️ SHORT CIRCULAR PATH - May cause performance issues'
ELSE '⚠️ LONG CIRCULAR PATH - Complex dependency chain'
END AS CIRCULAR_DEPENDENCY_SEVERITY
FROM ASSETS A
INNER JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Start from this asset
WHERE
-- The related asset in lineage is the same asset (circular)
L.RELATED_GUID = A.GUID
-- Focus on specific asset types
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView', 'DbtModel')
-- Focus on active assets
AND A.STATUS = 'ACTIVE'
ORDER BY
L.LEVEL ASC, -- Direct circles first
A.ASSET_NAME;
USE atlan_gold;
-- ============================================
-- CIRCULAR DEPENDENCIES: Find Assets with Lineage to Themselves
-- Detects self-referencing patterns that may indicate recursive queries or design issues
-- ============================================
SELECT
A.GUID AS ASSET_GUID,
A.ASSET_NAME, -- Asset with circular reference
A.ASSET_TYPE, -- Type (Table, View, etc.)
A.ASSET_QUALIFIED_NAME AS ASSET_PATH, -- Full path
A.CONNECTOR_NAME,
A.OWNER_USERS,
A.CERTIFICATE_STATUS,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000) AS LAST_UPDATED,
-- Lineage relationship back to itself
L.DIRECTION AS LINEAGE_DIRECTION, -- UPSTREAM or DOWNSTREAM
L.LEVEL AS CIRCULAR_PATH_LENGTH, -- How many hops in the circle
L.RELATED_NAME AS RELATED_ASSET_NAME,
L.RELATED_TYPE AS RELATED_ASSET_TYPE,
-- Flag potential issues
CASE
WHEN L.LEVEL = 1 THEN '⚠️ DIRECT SELF-REFERENCE - Review immediately'
WHEN L.LEVEL <= 3 THEN '⚠️ SHORT CIRCULAR PATH - May cause performance issues'
ELSE '⚠️ LONG CIRCULAR PATH - Complex dependency chain'
END AS CIRCULAR_DEPENDENCY_SEVERITY
FROM ASSETS A
INNER JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Start from this asset
WHERE
-- The related asset in lineage is the same asset (circular)
L.RELATED_GUID = A.GUID
-- Focus on specific asset types
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView', 'DbtModel')
-- Focus on active assets
AND A.STATUS = 'ACTIVE'
ORDER BY
L.LEVEL ASC, -- Direct circles first
A.ASSET_NAME;
SET @@dataset_id = 'ATLAN_GOLD';
-- ============================================
-- CIRCULAR DEPENDENCIES: Find Assets with Lineage to Themselves
-- Detects self-referencing patterns that may indicate recursive queries or design issues
-- ============================================
SELECT
A.GUID AS ASSET_GUID,
A.ASSET_NAME, -- Asset with circular reference
A.ASSET_TYPE, -- Type (Table, View, etc.)
A.ASSET_QUALIFIED_NAME AS ASSET_PATH, -- Full path
A.CONNECTOR_NAME,
A.OWNER_USERS,
A.CERTIFICATE_STATUS,
TIMESTAMP_MILLIS(A.UPDATED_AT) AS LAST_UPDATED,
-- Lineage relationship back to itself
L.DIRECTION AS LINEAGE_DIRECTION, -- UPSTREAM or DOWNSTREAM
L.LEVEL AS CIRCULAR_PATH_LENGTH, -- How many hops in the circle
L.RELATED_NAME AS RELATED_ASSET_NAME,
L.RELATED_TYPE AS RELATED_ASSET_TYPE,
-- Flag potential issues
CASE
WHEN L.LEVEL = 1 THEN '⚠️ DIRECT SELF-REFERENCE - Review immediately'
WHEN L.LEVEL <= 3 THEN '⚠️ SHORT CIRCULAR PATH - May cause performance issues'
ELSE '⚠️ LONG CIRCULAR PATH - Complex dependency chain'
END AS CIRCULAR_DEPENDENCY_SEVERITY
FROM ASSETS A
INNER JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Start from this asset
WHERE
-- The related asset in lineage is the same asset (circular)
L.RELATED_GUID = A.GUID
-- Focus on specific asset types
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView', 'DbtModel')
-- Focus on active assets
AND A.STATUS = 'ACTIVE'
ORDER BY
L.LEVEL ASC, -- Direct circles first
A.ASSET_NAME;
Sample output
| ASSET_NAME | ASSET_TYPE | CONNECTOR_NAME | LINEAGE_DIRECTION | CIRCULAR_PATH_LENGTH | CIRCULAR_DEPENDENCY_SEVERITY |
|---|---|---|---|---|---|
| INSTACART_AISLES | Table | teradata | DOWNSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
| INSTACART_AISLES | Table | teradata | UPSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
| INSTACART_BEVERAGES_ORDER_CUSTOMER | Table | snowflake | DOWNSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
| INSTACART_BEVERAGES_ORDER_CUSTOMER | Table | snowflake | UPSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
| INSTACART_CUSTOMER_DETAILS | Table | teradata | UPSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
| INSTACART_CUSTOMER_DETAILS | Table | snowflake | DOWNSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
| INSTACART_DEPARTMENTS | Table | teradata | UPSTREAM | 1 | DIRECT SELF-REFERENCE - Review immediately |
Lineage coverage & analytics
These queries provide high-level metrics and analytics about lineage coverage across your data platform. These metrics are essential for measuring data governance maturity and identifying gaps in lineage capture. They help answer questions like:
- What percentage of assets have lineage?
- Which connectors have the best lineage coverage?
- How many source vs sink vs process assets exist?
Overall lineage coverage summary
Provides a high-level dashboard view of lineage coverage across the entire data platform. Calculates the overall percentage of assets that have lineage information and breaks down assets by their role in data pipelines (sources that produce data, transformers that process data, and sinks that consume data).
-- ============================================
-- LINEAGE COVERAGE SUMMARY: Overall Platform Metrics
-- Shows percentage of assets with lineage and breakdown by status
-- ============================================
SELECT
COUNT(*) AS TOTAL_ASSETS,
-- Assets with lineage
SUM(CASE WHEN HAS_LINEAGE = TRUE THEN 1 ELSE 0 END) AS ASSETS_WITH_LINEAGE,
SUM(CASE WHEN HAS_LINEAGE = FALSE THEN 1 ELSE 0 END) AS ASSETS_WITHOUT_LINEAGE,
-- Calculate percentages
ROUND(
(SUM(CASE WHEN HAS_LINEAGE = TRUE THEN 1 ELSE 0 END) * 100.0) / COUNT(*),
2
) AS LINEAGE_COVERAGE_PCT
FROM ASSETS
WHERE
STATUS = 'ACTIVE'; -- Only active assets
Sample output
| TOTAL_ASSETS | ASSETS_WITH_LINEAGE | ASSETS_WITHOUT_LINEAGE | LINEAGE_COVERAGE_PCT |
|---|---|---|---|
| 912348 | 369449 | 535160 | 40.49 |
Lineage coverage by connector
Compares lineage coverage across different data platforms (Snowflake, BigQuery, Tableau, etc.) to identify which systems have strong lineage capture and which need improvement. Shows both the percentage of assets with lineage and the average number of connections per asset, helping you understand data flow density. Use this to prioritize which connectors need better lineage instrumentation or to validate that lineage is working correctly after new connector deployments.
-- ============================================
-- LINEAGE COVERAGE BY CONNECTOR: Platform-Specific Metrics
-- Compare lineage coverage across different data platforms
-- ============================================
SELECT
CONNECTOR_NAME,
ASSET_TYPE,
COUNT(*) AS TOTAL_ASSETS,
SUM(CASE WHEN HAS_LINEAGE = TRUE THEN 1 ELSE 0 END) AS WITH_LINEAGE,
SUM(CASE WHEN HAS_LINEAGE = FALSE THEN 1 ELSE 0 END) AS WITHOUT_LINEAGE,
-- Lineage coverage percentage
ROUND(
(SUM(CASE WHEN HAS_LINEAGE = TRUE THEN 1 ELSE 0 END) * 100.0) / COUNT(*),
2
) AS LINEAGE_COVERAGE_PCT
FROM ASSETS
WHERE
STATUS = 'ACTIVE'
AND NULLIF(CONNECTOR_NAME,'') IS NOT NULL
GROUP BY
CONNECTOR_NAME,
ASSET_TYPE
ORDER BY
CONNECTOR_NAME,
LINEAGE_COVERAGE_PCT DESC,
TOTAL_ASSETS DESC;
Sample output
| CONNECTOR_NAME | ASSET_TYPE | TOTAL_ASSETS | WITH_LINEAGE | WITHOUT_LINEAGE | LINEAGE_COVERAGE_PCT |
|---|---|---|---|---|---|
| Sqoop | Connection | 1 | 0 | 1 | 0.0 |
| adls | ADLSObject | 8 | 3 | 5 | 37.5 |
| adls | ADLSContainer | 2 | 0 | 2 | 0.0 |
| adls | ADLSAccount | 2 | 0 | 2 | 0.0 |
| adls | Connection | 2 | 0 | 2 | 0.0 |
| ai | AIModel | 1 | 1 | 0 | 100.0 |
| airflow | AirflowTask | 31 | 31 | 0 | 100.0 |
| airflow | ColumnProcess | 50 | 45 | 5 | 90.0 |
| airflow | Process | 42 | 36 | 6 | 85.71 |
| airflow | AirflowDag | 6 | 0 | 6 | 0.0 |
Identify lineage hubs
Identifies the "hub" tables and views in your data ecosystem, those with the highest number of upstream and downstream connections. These are your most critical assets because they have the widest blast radius if something goes wrong. The query categorizes each asset by criticality based on downstream dependency count (20+ is CRITICAL, 10-19 is HIGH, etc.). Use this to prioritize which assets need the most rigorous data quality monitoring, change management processes, and documentation. These are the tables you must protect from breaking changes.
- Snowflake
- Databricks
- BigQuery
-- ============================================
-- MOST CONNECTED ASSETS: Identify Critical Lineage Hubs
-- Find tables/views with highest number of dependencies by counting connections from LINEAGE table
-- ============================================
SELECT
A.ASSET_NAME,
A.ASSET_TYPE,
A.ASSET_QUALIFIED_NAME,
A.CONNECTOR_NAME,
A.CERTIFICATE_STATUS,
A.OWNER_USERS,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000) AS LAST_UPDATED,
DATEDIFF(day, TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000), CURRENT_TIMESTAMP()) AS DAYS_SINCE_UPDATE,
-- Count upstream connections (direct sources feeding this asset)
COUNT(CASE WHEN L.DIRECTION = 'UPSTREAM' AND L.LEVEL = 1 THEN 1 END) AS UPSTREAM_COUNT,
-- Count downstream connections (direct consumers using this asset)
COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) AS DOWNSTREAM_COUNT,
-- Total direct connections (both upstream and downstream)
COUNT(CASE WHEN L.LEVEL = 1 THEN 1 END) AS TOTAL_CONNECTIONS,
-- Classify criticality based on downstream dependencies (blast radius)
CASE
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 20
THEN '🔴 CRITICAL - 20+ downstream dependencies'
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 10
THEN '🟠 HIGH IMPACT - 10-19 downstream dependencies'
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 5
THEN '🟡 MODERATE IMPACT - 5-9 downstream dependencies'
ELSE '🟢 LOW IMPACT - <5 downstream dependencies'
END AS CRITICALITY_LEVEL
FROM ASSETS A
LEFT JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Join to get all lineage connections for this asset
WHERE
A.HAS_LINEAGE = TRUE
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView')
AND A.STATUS = 'ACTIVE'
GROUP BY
A.ASSET_NAME,
A.ASSET_TYPE,
A.ASSET_QUALIFIED_NAME,
A.CONNECTOR_NAME,
A.CERTIFICATE_STATUS,
A.OWNER_USERS,
A.UPDATED_AT -- Need to group by UPDATED_AT to use in DATEDIFF
HAVING
COUNT(CASE WHEN L.LEVEL = 1 THEN 1 END) > 0 -- Only assets with at least one direct connection
ORDER BY
TOTAL_CONNECTIONS DESC;
-- ============================================
-- MOST CONNECTED ASSETS: Identify Critical Lineage Hubs
-- Find tables/views with highest number of dependencies by counting connections from LINEAGE table
-- ============================================
SELECT
A.ASSET_NAME,
A.ASSET_TYPE,
A.ASSET_QUALIFIED_NAME,
A.CONNECTOR_NAME,
A.CERTIFICATE_STATUS,
A.OWNER_USERS,
TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000) AS LAST_UPDATED,
DATEDIFF(day, TO_TIMESTAMP_LTZ(A.UPDATED_AT / 1000), CURRENT_TIMESTAMP()) AS DAYS_SINCE_UPDATE,
-- Count upstream connections (direct sources feeding this asset)
COUNT(CASE WHEN L.DIRECTION = 'UPSTREAM' AND L.LEVEL = 1 THEN 1 END) AS UPSTREAM_COUNT,
-- Count downstream connections (direct consumers using this asset)
COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) AS DOWNSTREAM_COUNT,
-- Total direct connections (both upstream and downstream)
COUNT(CASE WHEN L.LEVEL = 1 THEN 1 END) AS TOTAL_CONNECTIONS,
-- Classify criticality based on downstream dependencies (blast radius)
CASE
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 20
THEN '🔴 CRITICAL - 20+ downstream dependencies'
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 10
THEN '🟠 HIGH IMPACT - 10-19 downstream dependencies'
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 5
THEN '🟡 MODERATE IMPACT - 5-9 downstream dependencies'
ELSE '🟢 LOW IMPACT - <5 downstream dependencies'
END AS CRITICALITY_LEVEL
FROM ASSETS A
LEFT JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Join to get all lineage connections for this asset
WHERE
A.HAS_LINEAGE = TRUE
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView')
AND A.STATUS = 'ACTIVE'
GROUP BY
A.ASSET_NAME,
A.ASSET_TYPE,
A.ASSET_QUALIFIED_NAME,
A.CONNECTOR_NAME,
A.CERTIFICATE_STATUS,
A.OWNER_USERS,
A.UPDATED_AT -- Need to group by UPDATED_AT to use in DATEDIFF
HAVING
COUNT(CASE WHEN L.LEVEL = 1 THEN 1 END) > 0 -- Only assets with at least one direct connection
ORDER BY
TOTAL_CONNECTIONS DESC;
-- ============================================
-- MOST CONNECTED ASSETS: Identify Critical Lineage Hubs
-- Find tables/views with highest number of dependencies by counting connections from LINEAGE table
-- ============================================
SELECT
A.ASSET_NAME,
A.ASSET_TYPE,
A.ASSET_QUALIFIED_NAME,
A.CONNECTOR_NAME,
A.CERTIFICATE_STATUS,
A.OWNER_USERS,
TIMESTAMP_MILLIS(A.UPDATED_AT) AS LAST_UPDATED,
TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), TIMESTAMP_MILLIS(A.UPDATED_AT), DAY) AS DAYS_SINCE_UPDATE,
-- Count upstream connections (direct sources feeding this asset)
COUNT(CASE WHEN L.DIRECTION = 'UPSTREAM' AND L.LEVEL = 1 THEN 1 END) AS UPSTREAM_COUNT,
-- Count downstream connections (direct consumers using this asset)
COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) AS DOWNSTREAM_COUNT,
-- Total direct connections (both upstream and downstream)
COUNT(CASE WHEN L.LEVEL = 1 THEN 1 END) AS TOTAL_CONNECTIONS,
-- Classify criticality based on downstream dependencies (blast radius)
CASE
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 20
THEN '🔴 CRITICAL - 20+ downstream dependencies'
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 10
THEN '🟠 HIGH IMPACT - 10-19 downstream dependencies'
WHEN COUNT(CASE WHEN L.DIRECTION = 'DOWNSTREAM' AND L.LEVEL = 1 THEN 1 END) >= 5
THEN '🟡 MODERATE IMPACT - 5-9 downstream dependencies'
ELSE '🟢 LOW IMPACT - <5 downstream dependencies'
END AS CRITICALITY_LEVEL
FROM ASSETS A
LEFT JOIN LINEAGE L
ON A.GUID = L.START_GUID -- Join to get all lineage connections for this asset
WHERE
A.HAS_LINEAGE = TRUE
AND A.ASSET_TYPE IN ('Table', 'View', 'MaterializedView')
AND A.STATUS = 'ACTIVE'
GROUP BY
A.ASSET_NAME,
A.ASSET_TYPE,
A.ASSET_QUALIFIED_NAME,
A.CONNECTOR_NAME,
A.CERTIFICATE_STATUS,
A.OWNER_USERS,
A.UPDATED_AT -- Need to group by UPDATED_AT to use in TIMESTAMP_DIFF
HAVING
COUNT(CASE WHEN L.LEVEL = 1 THEN 1 END) > 0 -- Only assets with at least one direct connection
ORDER BY
TOTAL_CONNECTIONS DESC;
Sample output
| ASSET_NAME | ASSET_TYPE | CONNECTOR_NAME | UPSTREAM_COUNT | DOWNSTREAM_COUNT | TOTAL_CONNECTIONS | CRITICALITY_LEVEL |
|---|---|---|---|---|---|---|
| ORDERS_DENORMALIZED | Table | snowflake | 3 | 16 | 19 | HIGH IMPACT - 10-19 downstream dependencies |
| FACT_ORDERS | Table | snowflake | 2 | 14 | 16 | HIGH IMPACT - 10-19 downstream dependencies |
| fct_sales | Table | bigquery | 8 | 8 | 16 | MODERATE IMPACT - 5-9 downstream dependencies |
| FCT_SALES | Table | snowflake | 8 | 7 | 15 | MODERATE IMPACT - 5-9 downstream dependencies |
| DIM_CUSTOMER | Table | snowflake | 3 | 12 | 15 | HIGH IMPACT - 10-19 downstream dependencies |
| dim_employee | Table | databricks | 1 | 12 | 13 | HIGH IMPACT - 10-19 downstream dependencies |
| people | Table | postgres | 0 | 13 | 13 | HIGH IMPACT - 10-19 downstream dependencies |
| dim_customer | Table | redshift | 3 | 10 | 13 | HIGH IMPACT - 10-19 downstream dependencies |
| dim_customer | Table | databricks | 3 | 9 | 12 | MODERATE IMPACT - 5-9 downstream dependencies |
| DIM_STOCKITEM | Table | snowflake | 3 | 9 | 12 | MODERATE IMPACT - 5-9 downstream dependencies |