Set up lineage tables
You can create tables to assist with lineage use cases by running the SQL statements below in your warehouse. The LINEAGE table is not part of the native GOLD namespace, but you can build it yourself using data from the entity_metadata namespace and the ASSETS table in the GOLD namespace.
Once created, you can use the LINEAGE table with the lineage use case queries in this documentation.
How it works
The setup creates three objects that build on each other:
LINEAGE_EDGES(view) — Flattens theinputsandoutputsarrays from all process tables inentity_metadatainto simple(process_guid, input_guid, output_guid)rows.BASE_EDGES(table) — EnrichesLINEAGE_EDGESwith human-readable asset names and types from theASSETStable in the GOLD namespace.LINEAGE(view) — Recursive traversal onBASE_EDGESthat computes multi-hop upstream and downstream lineage with cycle detection.
Prerequisites
- Lakehouse is enabled for your tenant. See Enable Lakehouse.
- Your query engine is connected to Lakehouse. See Connect Snowflake or Connect Athena.
- A database in your warehouse where you can create views and tables (the Lakehouse catalog is read-only).
Create lineage tables
These objects are created in your own database and read from the Lakehouse catalog. Run the following SQL statements in order.
Before you start, replace the placeholder values for your platform:
- Snowflake: Replace
MY_DATABASE.MY_SCHEMAwith your target database and schema. ReplaceATLAN_CONTEXT_STOREwith your Lakehouse catalog database name. - Databricks: Replace
MY_CATALOG.MY_SCHEMAwith your target catalog and schema. ReplaceLAKEHOUSE_CATALOGwith your Lakehouse catalog name in Unity Catalog. - BigQuery: Replace
MY_PROJECT.MY_DATASETwith your target project and dataset. ReplacePROJECT_ID.LAKEHOUSE_DATASETwith the project and dataset where your Lakehouse external tables are registered.
- Snowflake
- Databricks
- BigQuery
Step 1: Create LINEAGE_EDGES view
This view flattens process inputs and outputs into individual edges.
CREATE OR REPLACE VIEW MY_DATABASE.MY_SCHEMA.LINEAGE_EDGES (
process_guid,
input_guid,
output_guid
)
COMMENT = 'Flattened lineage edges from all process types'
AS
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".process p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".biprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".columnprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".dbtprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".dbtcolumnprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE';
Step 2: Create BASE_EDGES table
This table enriches edges with asset names and types from the GOLD namespace ASSETS table.
CREATE OR REPLACE TABLE MY_DATABASE.MY_SCHEMA.BASE_EDGES (
process_guid,
input_guid,
input_name,
input_type,
output_guid,
output_name,
output_type
)
COMMENT = 'Lineage edges enriched with asset names and types'
AS
SELECT DISTINCT
e.process_guid,
e.input_guid,
COALESCE(i.asset_name, e.input_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.output_guid,
COALESCE(o.asset_name, e.output_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM MY_DATABASE.MY_SCHEMA.LINEAGE_EDGES e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM ATLAN_CONTEXT_STORE."gold".assets
) i ON e.input_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM ATLAN_CONTEXT_STORE."gold".assets
) o ON e.output_guid = o.guid
WHERE e.input_guid IS NOT NULL
AND e.output_guid IS NOT NULL;
Step 3: Create LINEAGE view
This view computes multi-hop upstream and downstream lineage using a recursive CTE.
CREATE OR REPLACE VIEW MY_DATABASE.MY_SCHEMA.LINEAGE (
direction,
start_guid,
start_name,
start_type,
related_guid,
related_name,
related_type,
connecting_guid,
level
)
COMMENT = 'Multi-hop upstream and downstream lineage for all assets'
AS
WITH
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_DATABASE.MY_SCHEMA.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE POSITION(e.output_guid IN d.path_str) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_DATABASE.MY_SCHEMA.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE POSITION(e.input_guid IN u.path_str) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;
Step 1: Create LINEAGE_EDGES view
This view flattens process inputs and outputs into individual edges.
CREATE OR REPLACE VIEW MY_CATALOG.MY_SCHEMA.LINEAGE_EDGES (
process_guid,
input_guid,
output_guid
)
COMMENT 'Flattened lineage edges from all process types'
AS
SELECT
p.guid AS process_guid,
f.col AS input_guid,
o.col AS output_guid
FROM LAKEHOUSE_CATALOG.entity_metadata.process p
LATERAL VIEW EXPLODE(p.inputs) f
LATERAL VIEW EXPLODE(p.outputs) o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.col AS input_guid,
o.col AS output_guid
FROM LAKEHOUSE_CATALOG.entity_metadata.biprocess p
LATERAL VIEW EXPLODE(p.inputs) f
LATERAL VIEW EXPLODE(p.outputs) o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.col AS input_guid,
o.col AS output_guid
FROM LAKEHOUSE_CATALOG.entity_metadata.columnprocess p
LATERAL VIEW EXPLODE(p.inputs) f
LATERAL VIEW EXPLODE(p.outputs) o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.col AS input_guid,
o.col AS output_guid
FROM LAKEHOUSE_CATALOG.entity_metadata.dbtprocess p
LATERAL VIEW EXPLODE(p.inputs) f
LATERAL VIEW EXPLODE(p.outputs) o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.col AS input_guid,
o.col AS output_guid
FROM LAKEHOUSE_CATALOG.entity_metadata.dbtcolumnprocess p
LATERAL VIEW EXPLODE(p.inputs) f
LATERAL VIEW EXPLODE(p.outputs) o
WHERE p.status = 'ACTIVE';
Step 2: Create BASE_EDGES materialized view
This materialized view enriches edges with asset names and types from the GOLD namespace ASSETS table.
CREATE OR REPLACE MATERIALIZED VIEW MY_CATALOG.MY_SCHEMA.BASE_EDGES
COMMENT 'Lineage edges enriched with asset names and types'
AS
SELECT DISTINCT
e.process_guid,
e.input_guid,
COALESCE(i.asset_name, e.input_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.output_guid,
COALESCE(o.asset_name, e.output_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM MY_CATALOG.MY_SCHEMA.LINEAGE_EDGES e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM LAKEHOUSE_CATALOG.gold.assets
) i ON e.input_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM LAKEHOUSE_CATALOG.gold.assets
) o ON e.output_guid = o.guid
WHERE e.input_guid IS NOT NULL
AND e.output_guid IS NOT NULL;
Step 3: Create LINEAGE view
This view computes multi-hop upstream and downstream lineage using a recursive CTE.
CREATE OR REPLACE VIEW MY_CATALOG.MY_SCHEMA.LINEAGE (
direction,
start_guid,
start_name,
start_type,
related_guid,
related_name,
related_type,
connecting_guid,
level
)
COMMENT 'Multi-hop upstream and downstream lineage for all assets'
AS
WITH RECURSIVE
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_CATALOG.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_CATALOG.MY_SCHEMA.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE LOCATE(e.output_guid, d.path_str) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_CATALOG.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_CATALOG.MY_SCHEMA.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE LOCATE(e.input_guid, u.path_str) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;
Step 1: Create LINEAGE_EDGES view
This view flattens process inputs and outputs into individual edges. Replace PROJECT_ID.LAKEHOUSE_DATASET with the BigQuery dataset where your Lakehouse external tables are registered.
CREATE OR REPLACE VIEW MY_PROJECT.MY_DATASET.LINEAGE_EDGES (
process_guid OPTIONS(description='Process GUID'),
input_guid OPTIONS(description='Upstream asset GUID'),
output_guid OPTIONS(description='Downstream asset GUID')
)
OPTIONS(description='Flattened lineage edges from all process types')
AS
SELECT
p.guid AS process_guid,
f AS input_guid,
o AS output_guid
FROM `PROJECT_ID.LAKEHOUSE_DATASET.Process` p,
UNNEST(p.inputs) AS f,
UNNEST(p.outputs) AS o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f AS input_guid,
o AS output_guid
FROM `PROJECT_ID.LAKEHOUSE_DATASET.BIProcess` p,
UNNEST(p.inputs) AS f,
UNNEST(p.outputs) AS o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f AS input_guid,
o AS output_guid
FROM `PROJECT_ID.LAKEHOUSE_DATASET.ColumnProcess` p,
UNNEST(p.inputs) AS f,
UNNEST(p.outputs) AS o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f AS input_guid,
o AS output_guid
FROM `PROJECT_ID.LAKEHOUSE_DATASET.DbtProcess` p,
UNNEST(p.inputs) AS f,
UNNEST(p.outputs) AS o
WHERE p.status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f AS input_guid,
o AS output_guid
FROM `PROJECT_ID.LAKEHOUSE_DATASET.DbtColumnProcess` p,
UNNEST(p.inputs) AS f,
UNNEST(p.outputs) AS o
WHERE p.status = 'ACTIVE';
Step 2: Create BASE_EDGES table
This table enriches edges with asset names and types from the GOLD namespace ASSETS table.
CREATE OR REPLACE TABLE MY_PROJECT.MY_DATASET.BASE_EDGES
OPTIONS(description='Lineage edges enriched with asset names and types')
AS
SELECT DISTINCT
e.process_guid,
e.input_guid,
COALESCE(i.asset_name, e.input_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.output_guid,
COALESCE(o.asset_name, e.output_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM MY_PROJECT.MY_DATASET.LINEAGE_EDGES e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM `PROJECT_ID.LAKEHOUSE_DATASET.assets`
) i ON e.input_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM `PROJECT_ID.LAKEHOUSE_DATASET.assets`
) o ON e.output_guid = o.guid
WHERE e.input_guid IS NOT NULL
AND e.output_guid IS NOT NULL;
Step 3: Create LINEAGE view
This view computes multi-hop upstream and downstream lineage using a recursive CTE.
CREATE OR REPLACE VIEW MY_PROJECT.MY_DATASET.LINEAGE
OPTIONS(description='Multi-hop upstream and downstream lineage for all assets')
AS
WITH RECURSIVE
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_PROJECT.MY_DATASET.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_PROJECT.MY_DATASET.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE STRPOS(d.path_str, e.output_guid) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_PROJECT.MY_DATASET.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_PROJECT.MY_DATASET.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE STRPOS(u.path_str, e.input_guid) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;
Refresh the data
The BASE_EDGES table is a materialized snapshot. To pick up new lineage connections, re-run the Step 2 CREATE OR REPLACE TABLE statement on a schedule that suits your needs (for example, hourly or daily). The LINEAGE_EDGES and LINEAGE views always read live data from BASE_EDGES and the Lakehouse catalog, so they don't need separate refreshes.
Verify setup
After creating all three objects, run a quick validation:
-- Check that edges were populated
SELECT COUNT(*) FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES;
-- Check that lineage traversal works
SELECT * FROM MY_DATABASE.MY_SCHEMA.LINEAGE LIMIT 10;