Set up lineage tables
This guide walks you through creating a helper LINEAGE view in your warehouse that works with the GOLD namespace's native LINEAGE_ADJACENCY_LIST table. With this view, you can run advanced lineage queries that include direction (upstream/downstream), hop level, and asset names in a single result set.
Prerequisites
Before you begin, make sure that:
- Lakehouse is enabled for your tenant and the status shows Ready. See Enable Lakehouse.
- Your query engine is connected to Lakehouse. See Connect Snowflake or Connect Athena.
- You have write access to a database in your warehouse where you can create views and tables. (The Lakehouse catalog is read-only.)
Create LINEAGE view
The LINEAGE view is built on top of a BASE_EDGES table that enriches lineage edges with asset names and types. Follow these steps to create both objects in your warehouse.
- Snowflake
- Databricks
- BigQuery
-
In your Snowflake warehouse, open a SQL editor and replace the placeholder values in the SQL below:
- Replace
MY_DATABASE.MY_SCHEMAwith your target database and schema - Replace
ATLAN_CONTEXT_STOREwith your Lakehouse catalog database name
Then, create the
BASE_EDGEStable by running the following SQL. This table joins each edge inLINEAGE_ADJACENCY_LISTto theASSETStable to attach human-readable names and types to both ends: - Replace
CREATE OR REPLACE TABLE MY_DATABASE.MY_SCHEMA.BASE_EDGES (
input_guid,
input_name,
input_type,
output_guid,
output_name,
output_type
)
COMMENT = 'Lineage edges enriched with asset names and types'
AS
SELECT DISTINCT
e.from_guid AS input_guid,
COALESCE(i.asset_name, e.from_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.to_guid AS output_guid,
COALESCE(o.asset_name, e.to_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM ATLAN_CONTEXT_STORE."gold".lineage_adjacency_list e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM ATLAN_CONTEXT_STORE."gold".assets
) i ON e.from_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM ATLAN_CONTEXT_STORE."gold".assets
) o ON e.to_guid = o.guid
WHERE e.from_guid IS NOT NULL
AND e.to_guid IS NOT NULL;
- Create the
LINEAGEview by running the following SQL. This view computes multi-hop upstream and downstream lineage using a recursive CTE:
CREATE OR REPLACE VIEW MY_DATABASE.MY_SCHEMA.LINEAGE (
direction,
start_guid,
start_name,
start_type,
related_guid,
related_name,
related_type,
connecting_guid,
level
)
COMMENT = 'Multi-hop upstream and downstream lineage for all assets'
AS
WITH
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_DATABASE.MY_SCHEMA.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE POSITION(e.output_guid IN d.path_str) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_DATABASE.MY_SCHEMA.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE POSITION(e.input_guid IN u.path_str) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;
-
In your Databricks workspace, open a SQL editor and replace the placeholder values in the SQL below:
- Replace
MY_CATALOG.MY_SCHEMAwith your target catalog and schema - Replace
LAKEHOUSE_CATALOGwith your Lakehouse catalog name in Unity Catalog
Then, create the
BASE_EDGESmaterialized view by running the following SQL. This joins each edge inLINEAGE_ADJACENCY_LISTto theASSETStable to attach human-readable names and types to both ends: - Replace
CREATE OR REPLACE MATERIALIZED VIEW MY_CATALOG.MY_SCHEMA.BASE_EDGES
COMMENT 'Lineage edges enriched with asset names and types'
AS
SELECT DISTINCT
e.from_guid AS input_guid,
COALESCE(i.asset_name, e.from_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.to_guid AS output_guid,
COALESCE(o.asset_name, e.to_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM LAKEHOUSE_CATALOG.gold.lineage_adjacency_list e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM LAKEHOUSE_CATALOG.gold.assets
) i ON e.from_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM LAKEHOUSE_CATALOG.gold.assets
) o ON e.to_guid = o.guid
WHERE e.from_guid IS NOT NULL
AND e.to_guid IS NOT NULL;
- Create the
LINEAGEview by running the following SQL. This view computes multi-hop upstream and downstream lineage using a recursive CTE:
CREATE OR REPLACE VIEW MY_CATALOG.MY_SCHEMA.LINEAGE (
direction,
start_guid,
start_name,
start_type,
related_guid,
related_name,
related_type,
connecting_guid,
level
)
COMMENT 'Multi-hop upstream and downstream lineage for all assets'
AS
WITH RECURSIVE
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_CATALOG.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_CATALOG.MY_SCHEMA.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE LOCATE(e.output_guid, d.path_str) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_CATALOG.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_CATALOG.MY_SCHEMA.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE LOCATE(e.input_guid, u.path_str) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;
-
In the BigQuery console, open the SQL editor and replace the placeholder values in the SQL below:
- Replace
MY_PROJECT.MY_DATASETwith your target project and dataset - Replace
PROJECT_ID.LAKEHOUSE_DATASETwith the project and dataset where your Lakehouse external tables are registered
Then, create the
BASE_EDGEStable by running the following SQL. This table joins each edge inLINEAGE_ADJACENCY_LISTto theASSETStable to attach human-readable names and types to both ends: - Replace
CREATE OR REPLACE TABLE MY_PROJECT.MY_DATASET.BASE_EDGES
OPTIONS(description='Lineage edges enriched with asset names and types')
AS
SELECT DISTINCT
e.from_guid AS input_guid,
COALESCE(i.asset_name, e.from_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.to_guid AS output_guid,
COALESCE(o.asset_name, e.to_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM `PROJECT_ID.LAKEHOUSE_DATASET.lineage_adjacency_list` e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM `PROJECT_ID.LAKEHOUSE_DATASET.assets`
) i ON e.from_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM `PROJECT_ID.LAKEHOUSE_DATASET.assets`
) o ON e.to_guid = o.guid
WHERE e.from_guid IS NOT NULL
AND e.to_guid IS NOT NULL;
- Create the
LINEAGEview by running the following SQL. This view computes multi-hop upstream and downstream lineage using a recursive CTE:
CREATE OR REPLACE VIEW MY_PROJECT.MY_DATASET.LINEAGE
OPTIONS(description='Multi-hop upstream and downstream lineage for all assets')
AS
WITH RECURSIVE
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_PROJECT.MY_DATASET.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_PROJECT.MY_DATASET.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE STRPOS(d.path_str, e.output_guid) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_PROJECT.MY_DATASET.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_PROJECT.MY_DATASET.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE STRPOS(u.path_str, e.input_guid) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;
Verify setup
After creating both objects, run a quick validation:
-- Check that edges were populated
SELECT COUNT(*) FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES;
-- Check that lineage traversal works
SELECT * FROM MY_DATABASE.MY_SCHEMA.LINEAGE LIMIT 10;
Refresh data
The BASE_EDGES table is a materialized snapshot. To pick up new lineage connections, re-run the Step 1 CREATE OR REPLACE TABLE statement on a schedule that suits your needs (for example, hourly or daily). The LINEAGE view always reads live data from BASE_EDGES, so it doesn't need a separate refresh.
Next steps
Once you've created the LINEAGE view, you can run advanced lineage queries:
- Lineage analysis for query impact analysis, root cause analysis, and tag coverage across your data lineage
- LINEAGE_ADJACENCY_LIST for raw lineage edges and example queries
- GOLD namespace to learn about all available metadata tables in the Lakehouse