Skip to main content

Metrics and glossary alignment Private Preview

Identify and consolidate duplicate metrics and glossary terms across your organization using Lakehouse (MDLH) to keep reporting consistent and maintain a single source of truth.

This page provides SQL templates and guidance for metrics and glossary alignment across Tableau, Power BI, and glossary terms.

Before you begin

Before you run these queries, make sure:

You can use SQL queries directly on metadata tables in the Lakehouse to identify duplicate or similar metrics and glossary terms. Below are example queries for common alignment scenarios.

Identify potentially duplicate Tableau calculated fields

Find Tableau Calculated Fields that have the same name or formula. Two Calculated Fields are potentially duplicate if they have either the same name or the same formula. This helps identify redundant logic across different workbooks.

SELECT
t1.NAME AS "Calculated Field 1 Name",
t1.GUID AS "Calculated Field 1 GUID",
t1.WORKBOOKQUALIFIEDNAME AS "Calculated Field 1 Workbook",
t1.FORMULA AS "Calculated Field 1 Formula",
t2.NAME AS "Calculated Field 2 Name",
t2.GUID AS "Calculated Field 2 GUID",
t2.WORKBOOKQUALIFIEDNAME AS "Calculated Field 2 Workbook",
t2.FORMULA AS "Calculated Field 2 Formula"
FROM
TABLEAUCALCULATEDFIELD_ENTITY AS t1
JOIN
TABLEAUCALCULATEDFIELD_ENTITY AS t2
ON
-- Condition 1: Find rows where the calculated field names OR formulas are identical.
-- This is the core condition for finding all potential duplicates.
(LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME)) OR t1.FORMULA = t2.FORMULA)
WHERE
-- Condition 2: Exclude cases where a row is being compared to itself.
-- This also ensures that the two fields are unique, as each has a unique GUID.
t1.GUID < t2.GUID
ORDER BY
"Calculated Field 1 Name";

This query returns:

  • Calculated fields with identical names across different workbooks
  • Calculated fields with identical formulas (helping eliminate redundant logic)
  • Pairs of potentially duplicate metrics for review and consolidation

Identify potentially duplicate Power BI measures

Find Power BI Measures that have the same name. This helps identify redundant metric definitions across different datasets or reports.

SELECT
-- Select the name and GUID for the first measure in the pair.
t1.NAME "MEASURE 1 NAME",
t1.GUID "MEASURE 1 GUID",
t1.POWERBIMEASUREEXPRESSION "MEASURE 1 EXPRESSION",
-- Select the name and GUID for the second measure in the pair.
t2.NAME "MEASURE 2 NAME",
t2.GUID "MEASURE 2 GUID",
t2.POWERBIMEASUREEXPRESSION "MEASURE 2 EXPRESSION"
FROM
-- The first instance of the table.
POWERBIMEASURE_ENTITY AS t1
-- The second instance of the table.
JOIN
POWERBIMEASURE_ENTITY AS t2
ON
-- Condition 1: Find rows where the measure names are the same.
-- This is the core condition for finding duplicates.
LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME))
WHERE
-- Condition 3: Exclude cases where a row is being compared to itself.
-- This prevents the query from returning a measure paired with itself.
t1.GUID < t2.GUID
ORDER BY
"MEASURE 1 NAME";

This query returns:

  • Power BI measures with identical names across different datasets or reports
  • Measure expressions for comparison to identify if they calculate the same metric
  • Pairs of potentially duplicate metrics for review and consolidation
tip

You can modify the strictness of text matching by updating LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME)) to LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%') for fuzzy matching.

Once you identify potentially duplicate Power BI Measures, consider consolidating them into a dedicated table. See Microsoft Power BI documentation for more information.

Identify potentially duplicate glossary terms

For organizations with a core "Enterprise Business Glossary" as the source of truth, identify terms in other glossaries that may duplicate terms in the core glossary using similarity matching.

-- CTE 1: Select and standardize terms ONLY from the "Core Business" glossary (Source of Truth)
WITH core_terms AS (
SELECT
NAME AS core_name,
GUID AS core_guid,
USERDESCRIPTION AS core_description
FROM
GLOSSARYTERM_ENTITY
WHERE
-- Filter terms belonging to the Core Business Glossary GUID
ARRAY_CONTAINS('core-glossary-guid', ANCHOR)
),
-- CTE 2: Select and standardize terms from ALL OTHER glossaries (Potential Duplicates)
non_core_terms AS (
SELECT
NAME AS non_core_name,
GUID AS non_core_guid,
USERDESCRIPTION AS non_core_description,
ANCHOR AS non_core_anchor_guid
FROM
GLOSSARYTERM_ENTITY
WHERE
-- Exclude terms from the Core Business Glossary
NOT(ARRAY_CONTAINS('core-glossary-guid', ANCHOR))
),
-- CTE 3: Look up Glossary Names and GUIDs
glossary_lookup AS (
SELECT
GUID AS glossary_guid,
NAME AS glossary_name
FROM
GLOSSARY_ENTITY
)
-- Main Query: Join the two term sets and return matches
SELECT DISTINCT
T1.core_name AS source_of_truth_name,
T1.core_guid AS source_of_truth_guid,
T1.core_description AS source_of_truth_description,
T2.non_core_name AS potential_duplicate_name,
T2.non_core_guid AS potential_duplicate_guid,
T2.non_core_description AS potential_duplicate_description,
T3.glossary_name AS potential_duplicate_glossary_name,
T3.glossary_guid AS potential_duplicate_glossary_guid,
JAROWINKLER_SIMILARITY(T1.core_name, T2.non_core_name) AS similarity_score
FROM
core_terms AS T1
INNER JOIN
non_core_terms AS T2
-- Adjust this similarity score if needed - a similarity score of 100 = exact match
ON JAROWINKLER_SIMILARITY(T1.core_name, T2.non_core_name) >= 95
AND T1.core_guid != T2.non_core_guid
INNER JOIN
glossary_lookup AS T3
-- Join to find the Glossary Name and GUID by checking if the Glossary GUID is in the term's ANCHOR array
ON ARRAY_CONTAINS(T3.glossary_guid, T2.non_core_anchor_guid)
ORDER BY
similarity_score DESC, potential_duplicate_glossary_name, source_of_truth_name;

This query returns:

  • Glossary terms from non-core glossaries that are similar to terms in the core glossary
  • Similarity scores to help prioritize which terms to review (100 = exact match)
  • Glossary names and descriptions for both source of truth and potential duplicates
info

Customization:

  • Replace 'core-glossary-guid' with the actual GUID of your core business glossary
  • Adjust the similarity threshold (currently set to >= 95) if needed. A similarity score of 100 = exact match

Reference

Count queries

Use these queries to get the total number of potentially duplicate metrics or glossary terms.

Number of potentially duplicate Tableau calculated fields

WITH POTENTIAL_DUPLICATES AS
(
SELECT
t1.NAME AS "Calculated Field 1 Name",
t1.GUID AS "Calculated Field 1 GUID",
t1.WORKBOOKQUALIFIEDNAME AS "Calculated Field 1 Workbook",
t1.FORMULA AS "Calculated Field 1 Formula",
t2.NAME AS "Calculated Field 2 Name",
t2.GUID AS "Calculated Field 2 GUID",
t2.WORKBOOKQUALIFIEDNAME AS "Calculated Field 2 Workbook",
t2.FORMULA AS "Calculated Field 2 Formula"
FROM
TABLEAUCALCULATEDFIELD_ENTITY AS t1
JOIN
TABLEAUCALCULATEDFIELD_ENTITY AS t2
ON
(LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%') OR t1.FORMULA = t2.FORMULA)
WHERE
t1.GUID < t2.GUID
ORDER BY
"Calculated Field 1 Name"
)
SELECT COUNT(DISTINCT "Calculated Field 1 GUID") FROM POTENTIAL_DUPLICATES;

Number of potentially duplicate Power BI measures

WITH POTENTIAL_DUPLICATES AS
(
SELECT
t1.NAME "MEASURE 1 NAME",
t1.GUID "MEASURE 1 GUID",
t1.POWERBIMEASUREEXPRESSION "MEASURE 1 EXPRESSION",
t2.NAME "MEASURE 2 NAME",
t2.GUID "MEASURE 2 GUID",
t2.POWERBIMEASUREEXPRESSION "MEASURE 2 EXPRESSION"
FROM
POWERBIMEASURE_ENTITY AS t1
JOIN
POWERBIMEASURE_ENTITY AS t2
ON
LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%')
WHERE
t1.GUID < t2.GUID
ORDER BY
"MEASURE 1 NAME"
)
SELECT COUNT(DISTINCT "MEASURE 1 GUID") FROM POTENTIAL_DUPLICATES;

Column reference

ColumnDescription
NAMEMetric or glossary term name
GUIDUnique identifier for the metric or term
FORMULA (Tableau)Calculated field formula
POWERBIMEASUREEXPRESSION (Power BI)Measure expression
WORKBOOKQUALIFIEDNAME (Tableau)Fully qualified workbook name
USERDESCRIPTION (Glossary)Term description
ANCHOR (Glossary)Array containing glossary GUIDs the term belongs to
JAROWINKLER_SIMILARITYSimilarity score function (100 = exact match)