Metrics and glossary alignment Private Preview
Identify and consolidate duplicate metrics and glossary terms across your organization using Lakehouse (MDLH) to keep reporting consistent and maintain a single source of truth.
This page provides SQL templates and guidance for metrics and glossary alignment across Tableau, Power BI, and glossary terms.
Before you begin
Before you run these queries, make sure:
- Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
- The Gold layer is set up for your SQL engine. See Set up gold layer in Snowflake. For Databricks (Databricks setup guide) or BigQuery (BigQuery setup guide) setups, contact your Atlan Customer Success team.
You can use SQL queries directly on metadata tables in the Lakehouse to identify duplicate or similar metrics and glossary terms. Below are example queries for common alignment scenarios.
Identify potentially duplicate Tableau calculated fields
Find Tableau Calculated Fields that have the same name or formula. Two Calculated Fields are potentially duplicate if they have either the same name or the same formula. This helps identify redundant logic across different workbooks.
- Snowflake
SELECT
t1.NAME AS "Calculated Field 1 Name",
t1.GUID AS "Calculated Field 1 GUID",
t1.WORKBOOKQUALIFIEDNAME AS "Calculated Field 1 Workbook",
t1.FORMULA AS "Calculated Field 1 Formula",
t2.NAME AS "Calculated Field 2 Name",
t2.GUID AS "Calculated Field 2 GUID",
t2.WORKBOOKQUALIFIEDNAME AS "Calculated Field 2 Workbook",
t2.FORMULA AS "Calculated Field 2 Formula"
FROM
TABLEAUCALCULATEDFIELD_ENTITY AS t1
JOIN
TABLEAUCALCULATEDFIELD_ENTITY AS t2
ON
-- Condition 1: Find rows where the calculated field names OR formulas are identical.
-- This is the core condition for finding all potential duplicates.
(LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME)) OR t1.FORMULA = t2.FORMULA)
WHERE
-- Condition 2: Exclude cases where a row is being compared to itself.
-- This also ensures that the two fields are unique, as each has a unique GUID.
t1.GUID < t2.GUID
ORDER BY
"Calculated Field 1 Name";
This query returns:
- Calculated fields with identical names across different workbooks
- Calculated fields with identical formulas (helping eliminate redundant logic)
- Pairs of potentially duplicate metrics for review and consolidation
Identify potentially duplicate Power BI measures
Find Power BI Measures that have the same name. This helps identify redundant metric definitions across different datasets or reports.
- Snowflake
SELECT
-- Select the name and GUID for the first measure in the pair.
t1.NAME "MEASURE 1 NAME",
t1.GUID "MEASURE 1 GUID",
t1.POWERBIMEASUREEXPRESSION "MEASURE 1 EXPRESSION",
-- Select the name and GUID for the second measure in the pair.
t2.NAME "MEASURE 2 NAME",
t2.GUID "MEASURE 2 GUID",
t2.POWERBIMEASUREEXPRESSION "MEASURE 2 EXPRESSION"
FROM
-- The first instance of the table.
POWERBIMEASURE_ENTITY AS t1
-- The second instance of the table.
JOIN
POWERBIMEASURE_ENTITY AS t2
ON
-- Condition 1: Find rows where the measure names are the same.
-- This is the core condition for finding duplicates.
LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME))
WHERE
-- Condition 3: Exclude cases where a row is being compared to itself.
-- This prevents the query from returning a measure paired with itself.
t1.GUID < t2.GUID
ORDER BY
"MEASURE 1 NAME";
This query returns:
- Power BI measures with identical names across different datasets or reports
- Measure expressions for comparison to identify if they calculate the same metric
- Pairs of potentially duplicate metrics for review and consolidation
You can modify the strictness of text matching by updating LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME)) to LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%') for fuzzy matching.
Once you identify potentially duplicate Power BI Measures, consider consolidating them into a dedicated table. See Microsoft Power BI documentation for more information.
Identify potentially duplicate glossary terms
For organizations with a core "Enterprise Business Glossary" as the source of truth, identify terms in other glossaries that may duplicate terms in the core glossary using similarity matching.
- Snowflake
-- CTE 1: Select and standardize terms ONLY from the "Core Business" glossary (Source of Truth)
WITH core_terms AS (
SELECT
NAME AS core_name,
GUID AS core_guid,
USERDESCRIPTION AS core_description
FROM
GLOSSARYTERM_ENTITY
WHERE
-- Filter terms belonging to the Core Business Glossary GUID
ARRAY_CONTAINS('core-glossary-guid', ANCHOR)
),
-- CTE 2: Select and standardize terms from ALL OTHER glossaries (Potential Duplicates)
non_core_terms AS (
SELECT
NAME AS non_core_name,
GUID AS non_core_guid,
USERDESCRIPTION AS non_core_description,
ANCHOR AS non_core_anchor_guid
FROM
GLOSSARYTERM_ENTITY
WHERE
-- Exclude terms from the Core Business Glossary
NOT(ARRAY_CONTAINS('core-glossary-guid', ANCHOR))
),
-- CTE 3: Look up Glossary Names and GUIDs
glossary_lookup AS (
SELECT
GUID AS glossary_guid,
NAME AS glossary_name
FROM
GLOSSARY_ENTITY
)
-- Main Query: Join the two term sets and return matches
SELECT DISTINCT
T1.core_name AS source_of_truth_name,
T1.core_guid AS source_of_truth_guid,
T1.core_description AS source_of_truth_description,
T2.non_core_name AS potential_duplicate_name,
T2.non_core_guid AS potential_duplicate_guid,
T2.non_core_description AS potential_duplicate_description,
T3.glossary_name AS potential_duplicate_glossary_name,
T3.glossary_guid AS potential_duplicate_glossary_guid,
JAROWINKLER_SIMILARITY(T1.core_name, T2.non_core_name) AS similarity_score
FROM
core_terms AS T1
INNER JOIN
non_core_terms AS T2
-- Adjust this similarity score if needed - a similarity score of 100 = exact match
ON JAROWINKLER_SIMILARITY(T1.core_name, T2.non_core_name) >= 95
AND T1.core_guid != T2.non_core_guid
INNER JOIN
glossary_lookup AS T3
-- Join to find the Glossary Name and GUID by checking if the Glossary GUID is in the term's ANCHOR array
ON ARRAY_CONTAINS(T3.glossary_guid, T2.non_core_anchor_guid)
ORDER BY
similarity_score DESC, potential_duplicate_glossary_name, source_of_truth_name;
This query returns:
- Glossary terms from non-core glossaries that are similar to terms in the core glossary
- Similarity scores to help prioritize which terms to review (100 = exact match)
- Glossary names and descriptions for both source of truth and potential duplicates
Customization:
- Replace
'core-glossary-guid'with the actual GUID of your core business glossary - Adjust the similarity threshold (currently set to
>= 95) if needed. A similarity score of 100 = exact match
Reference
Count queries
Use these queries to get the total number of potentially duplicate metrics or glossary terms.
Number of potentially duplicate Tableau calculated fields
WITH POTENTIAL_DUPLICATES AS
(
SELECT
t1.NAME AS "Calculated Field 1 Name",
t1.GUID AS "Calculated Field 1 GUID",
t1.WORKBOOKQUALIFIEDNAME AS "Calculated Field 1 Workbook",
t1.FORMULA AS "Calculated Field 1 Formula",
t2.NAME AS "Calculated Field 2 Name",
t2.GUID AS "Calculated Field 2 GUID",
t2.WORKBOOKQUALIFIEDNAME AS "Calculated Field 2 Workbook",
t2.FORMULA AS "Calculated Field 2 Formula"
FROM
TABLEAUCALCULATEDFIELD_ENTITY AS t1
JOIN
TABLEAUCALCULATEDFIELD_ENTITY AS t2
ON
(LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%') OR t1.FORMULA = t2.FORMULA)
WHERE
t1.GUID < t2.GUID
ORDER BY
"Calculated Field 1 Name"
)
SELECT COUNT(DISTINCT "Calculated Field 1 GUID") FROM POTENTIAL_DUPLICATES;
Number of potentially duplicate Power BI measures
WITH POTENTIAL_DUPLICATES AS
(
SELECT
t1.NAME "MEASURE 1 NAME",
t1.GUID "MEASURE 1 GUID",
t1.POWERBIMEASUREEXPRESSION "MEASURE 1 EXPRESSION",
t2.NAME "MEASURE 2 NAME",
t2.GUID "MEASURE 2 GUID",
t2.POWERBIMEASUREEXPRESSION "MEASURE 2 EXPRESSION"
FROM
POWERBIMEASURE_ENTITY AS t1
JOIN
POWERBIMEASURE_ENTITY AS t2
ON
LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%')
WHERE
t1.GUID < t2.GUID
ORDER BY
"MEASURE 1 NAME"
)
SELECT COUNT(DISTINCT "MEASURE 1 GUID") FROM POTENTIAL_DUPLICATES;
Column reference
| Column | Description |
|---|---|
NAME | Metric or glossary term name |
GUID | Unique identifier for the metric or term |
FORMULA (Tableau) | Calculated field formula |
POWERBIMEASUREEXPRESSION (Power BI) | Measure expression |
WORKBOOKQUALIFIEDNAME (Tableau) | Fully qualified workbook name |
USERDESCRIPTION (Glossary) | Term description |
ANCHOR (Glossary) | Array containing glossary GUIDs the term belongs to |
JAROWINKLER_SIMILARITY | Similarity score function (100 = exact match) |