Skip to main content

Set up lineage tables

You can create tables to assist with lineage use cases by running the SQL statements below in your warehouse. The LINEAGE table is not part of the native GOLD namespace, but you can build it yourself using data from the entity_metadata namespace and the ASSETS table in the GOLD namespace.

Once created, you can use the LINEAGE table with the lineage use case queries in this documentation.

How it works

The setup creates three objects that build on each other:

  1. LINEAGE_EDGES (view) — Flattens the inputs and outputs arrays from all process tables in entity_metadata into simple (process_guid, input_guid, output_guid) rows.
  2. BASE_EDGES (table) — Enriches LINEAGE_EDGES with human-readable asset names and types from the ASSETS table in the GOLD namespace.
  3. LINEAGE (view) — Recursive traversal on BASE_EDGES that computes multi-hop upstream and downstream lineage with cycle detection.

Prerequisites

  • Lakehouse is enabled for your tenant. See Enable Lakehouse.
  • Your query engine is connected to Lakehouse. See Connect Snowflake or Connect Athena.
  • A database in your warehouse where you can create views and tables (the Lakehouse catalog is read-only).

Create lineage tables

These objects are created in your own database and read from the Lakehouse catalog. Run the following SQL statements in order.

Before you start, replace the placeholder values for your platform:

  • Snowflake: Replace MY_DATABASE.MY_SCHEMA with your target database and schema. Replace ATLAN_CONTEXT_STORE with your Lakehouse catalog database name.
  • Databricks: Replace MY_CATALOG.MY_SCHEMA with your target catalog and schema. Replace LAKEHOUSE_CATALOG with your Lakehouse catalog name in Unity Catalog.
  • BigQuery: Replace MY_PROJECT.MY_DATASET with your target project and dataset. Replace PROJECT_ID.LAKEHOUSE_DATASET with the project and dataset where your Lakehouse external tables are registered.

Step 1: Create LINEAGE_EDGES view

This view flattens process inputs and outputs into individual edges.

CREATE OR REPLACE VIEW MY_DATABASE.MY_SCHEMA.LINEAGE_EDGES (
process_guid,
input_guid,
output_guid
)
COMMENT = 'Flattened lineage edges from all process types'
AS
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".process p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".biprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".columnprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".dbtprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE'
UNION ALL
SELECT
p.guid AS process_guid,
f.value::string AS input_guid,
o.value::string AS output_guid
FROM ATLAN_CONTEXT_STORE."entity_metadata".dbtcolumnprocess p,
LATERAL FLATTEN(inputs) f,
LATERAL FLATTEN(outputs) o
WHERE status = 'ACTIVE';

Step 2: Create BASE_EDGES table

This table enriches edges with asset names and types from the GOLD namespace ASSETS table.

CREATE OR REPLACE TABLE MY_DATABASE.MY_SCHEMA.BASE_EDGES (
process_guid,
input_guid,
input_name,
input_type,
output_guid,
output_name,
output_type
)
COMMENT = 'Lineage edges enriched with asset names and types'
AS
SELECT DISTINCT
e.process_guid,
e.input_guid,
COALESCE(i.asset_name, e.input_guid) AS input_name,
COALESCE(i.asset_type, 'unknown') AS input_type,
e.output_guid,
COALESCE(o.asset_name, e.output_guid) AS output_name,
COALESCE(o.asset_type, 'unknown') AS output_type
FROM MY_DATABASE.MY_SCHEMA.LINEAGE_EDGES e
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM ATLAN_CONTEXT_STORE."gold".assets
) i ON e.input_guid = i.guid
LEFT JOIN (
SELECT DISTINCT guid, asset_name, asset_type
FROM ATLAN_CONTEXT_STORE."gold".assets
) o ON e.output_guid = o.guid
WHERE e.input_guid IS NOT NULL
AND e.output_guid IS NOT NULL;

Step 3: Create LINEAGE view

This view computes multi-hop upstream and downstream lineage using a recursive CTE.

CREATE OR REPLACE VIEW MY_DATABASE.MY_SCHEMA.LINEAGE (
direction,
start_guid,
start_name,
start_type,
related_guid,
related_name,
related_type,
connecting_guid,
level
)
COMMENT = 'Multi-hop upstream and downstream lineage for all assets'
AS
WITH
DOWNSTREAM AS (
SELECT
input_guid AS start_guid,
input_name AS start_name,
input_type AS start_type,
output_guid AS related_guid,
output_name AS related_name,
output_type AS related_type,
input_guid AS connecting_guid,
1 AS level,
input_guid || ',' || output_guid AS path_str
FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
d.start_guid,
d.start_name,
d.start_type,
e.output_guid AS related_guid,
e.output_name AS related_name,
e.output_type AS related_type,
d.related_guid AS connecting_guid,
d.level + 1 AS level,
d.path_str || ',' || e.output_guid AS path_str
FROM DOWNSTREAM d
JOIN MY_DATABASE.MY_SCHEMA.BASE_EDGES e
ON d.related_guid = e.input_guid
WHERE POSITION(e.output_guid IN d.path_str) = 0
),
UPSTREAM AS (
SELECT
output_guid AS start_guid,
output_name AS start_name,
output_type AS start_type,
input_guid AS related_guid,
input_name AS related_name,
input_type AS related_type,
output_guid AS connecting_guid,
1 AS level,
output_guid || ',' || input_guid AS path_str
FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES
UNION ALL
SELECT
u.start_guid,
u.start_name,
u.start_type,
e.input_guid AS related_guid,
e.input_name AS related_name,
e.input_type AS related_type,
u.related_guid AS connecting_guid,
u.level + 1 AS level,
u.path_str || ',' || e.input_guid AS path_str
FROM UPSTREAM u
JOIN MY_DATABASE.MY_SCHEMA.BASE_EDGES e
ON u.related_guid = e.output_guid
WHERE POSITION(e.input_guid IN u.path_str) = 0
)
SELECT DISTINCT
'DOWNSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM DOWNSTREAM
WHERE related_guid IS NOT NULL
UNION ALL
SELECT DISTINCT
'UPSTREAM' AS direction,
start_guid, start_name, start_type,
related_guid, related_name, related_type,
connecting_guid, level
FROM UPSTREAM
WHERE related_guid IS NOT NULL;

Refresh the data

The BASE_EDGES table is a materialized snapshot. To pick up new lineage connections, re-run the Step 2 CREATE OR REPLACE TABLE statement on a schedule that suits your needs (for example, hourly or daily). The LINEAGE_EDGES and LINEAGE views always read live data from BASE_EDGES and the Lakehouse catalog, so they don't need separate refreshes.

Verify setup

After creating all three objects, run a quick validation:

-- Check that edges were populated
SELECT COUNT(*) FROM MY_DATABASE.MY_SCHEMA.BASE_EDGES;

-- Check that lineage traversal works
SELECT * FROM MY_DATABASE.MY_SCHEMA.LINEAGE LIMIT 10;

See also