Metrics and glossary alignment Private Preview

Identify and consolidate duplicate metrics and glossary terms across your organization using Lakehouse (MDLH) to keep reporting consistent and maintain a single source of truth.

This page provides SQL templates and guidance for metrics and glossary alignment across Tableau, Power BI, and glossary terms.

Before you begin

Before you run these queries, make sure:

Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
The Gold layer is set up for your SQL engine. See Set up gold layer in Snowflake. For Databricks (Databricks setup guide) or BigQuery (BigQuery setup guide) setups, contact your Atlan Customer Success team.

You can use SQL queries directly on metadata tables in the Lakehouse to identify duplicate or similar metrics and glossary terms. Below are example queries for common alignment scenarios.

Identify potentially duplicate Tableau calculated fields

Find Tableau Calculated Fields that have the same name or formula. Two Calculated Fields are potentially duplicate if they have either the same name or the same formula. This helps identify redundant logic across different workbooks.

Snowflake

SELECT
  t1.NAME AS "Calculated Field 1 Name",
  t1.GUID AS "Calculated Field 1 GUID",
  t1.WORKBOOKQUALIFIEDNAME AS "Calculated Field 1 Workbook",
  t1.FORMULA AS "Calculated Field 1 Formula",
  t2.NAME AS "Calculated Field 2 Name",
  t2.GUID AS "Calculated Field 2 GUID",
  t2.WORKBOOKQUALIFIEDNAME AS "Calculated Field 2 Workbook",
  t2.FORMULA AS "Calculated Field 2 Formula"
FROM
  TABLEAUCALCULATEDFIELD_ENTITY AS t1
JOIN
  TABLEAUCALCULATEDFIELD_ENTITY AS t2
ON
  -- Condition 1: Find rows where the calculated field names OR formulas are identical.
  -- This is the core condition for finding all potential duplicates.
  (LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME)) OR t1.FORMULA = t2.FORMULA)
WHERE
  -- Condition 2: Exclude cases where a row is being compared to itself.
  -- This also ensures that the two fields are unique, as each has a unique GUID.
  t1.GUID < t2.GUID
ORDER BY
  "Calculated Field 1 Name";

This query returns:

Calculated fields with identical names across different workbooks
Calculated fields with identical formulas (helping eliminate redundant logic)
Pairs of potentially duplicate metrics for review and consolidation

Identify potentially duplicate Power BI measures

Find Power BI Measures that have the same name. This helps identify redundant metric definitions across different datasets or reports.

Snowflake

SELECT
  -- Select the name and GUID for the first measure in the pair.
  t1.NAME "MEASURE 1 NAME",
  t1.GUID "MEASURE 1 GUID",
  t1.POWERBIMEASUREEXPRESSION "MEASURE 1 EXPRESSION",
  -- Select the name and GUID for the second measure in the pair.
  t2.NAME "MEASURE 2 NAME",
  t2.GUID "MEASURE 2 GUID",
  t2.POWERBIMEASUREEXPRESSION "MEASURE 2 EXPRESSION"
FROM
  -- The first instance of the table.
  POWERBIMEASURE_ENTITY AS t1
  -- The second instance of the table.
JOIN
  POWERBIMEASURE_ENTITY AS t2
ON
  -- Condition 1: Find rows where the measure names are the same.
  -- This is the core condition for finding duplicates.
  LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME))
WHERE
  -- Condition 3: Exclude cases where a row is being compared to itself.
  -- This prevents the query from returning a measure paired with itself.
  t1.GUID < t2.GUID
ORDER BY
  "MEASURE 1 NAME";

This query returns:

Power BI measures with identical names across different datasets or reports
Measure expressions for comparison to identify if they calculate the same metric
Pairs of potentially duplicate metrics for review and consolidation

tip

You can modify the strictness of text matching by updating LOWER(TRIM(t1.NAME)) = LOWER(TRIM(t2.NAME)) to LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%') for fuzzy matching.

Once you identify potentially duplicate Power BI Measures, consider consolidating them into a dedicated table. See Microsoft Power BI documentation for more information.

Identify potentially duplicate glossary terms

For organizations with a core "Enterprise Business Glossary" as the source of truth, identify terms in other glossaries that may duplicate terms in the core glossary using similarity matching.

Snowflake

-- CTE 1: Select and standardize terms ONLY from the "Core Business" glossary (Source of Truth)
WITH core_terms AS (
    SELECT
        NAME AS core_name,
        GUID AS core_guid,
        USERDESCRIPTION AS core_description
    FROM
        GLOSSARYTERM_ENTITY
    WHERE
        -- Filter terms belonging to the Core Business Glossary GUID
        ARRAY_CONTAINS('core-glossary-guid', ANCHOR)
),
-- CTE 2: Select and standardize terms from ALL OTHER glossaries (Potential Duplicates)
non_core_terms AS (
    SELECT
        NAME AS non_core_name,
        GUID AS non_core_guid,
        USERDESCRIPTION AS non_core_description,
        ANCHOR AS non_core_anchor_guid
    FROM
        GLOSSARYTERM_ENTITY
    WHERE
        -- Exclude terms from the Core Business Glossary
        NOT(ARRAY_CONTAINS('core-glossary-guid', ANCHOR))
),
-- CTE 3: Look up Glossary Names and GUIDs
glossary_lookup AS (
    SELECT
        GUID AS glossary_guid,
        NAME AS glossary_name
    FROM
        GLOSSARY_ENTITY
)
-- Main Query: Join the two term sets and return matches
SELECT DISTINCT
    T1.core_name AS source_of_truth_name,
    T1.core_guid AS source_of_truth_guid,
    T1.core_description AS source_of_truth_description,
    T2.non_core_name AS potential_duplicate_name,
    T2.non_core_guid AS potential_duplicate_guid,
    T2.non_core_description AS potential_duplicate_description,
    T3.glossary_name AS potential_duplicate_glossary_name,
    T3.glossary_guid AS potential_duplicate_glossary_guid,
    JAROWINKLER_SIMILARITY(T1.core_name, T2.non_core_name) AS similarity_score
FROM
    core_terms AS T1
INNER JOIN
    non_core_terms AS T2
    -- Adjust this similarity score if needed - a similarity score of 100 = exact match
    ON JAROWINKLER_SIMILARITY(T1.core_name, T2.non_core_name) >= 95
    AND T1.core_guid != T2.non_core_guid
INNER JOIN
    glossary_lookup AS T3
    -- Join to find the Glossary Name and GUID by checking if the Glossary GUID is in the term's ANCHOR array
    ON ARRAY_CONTAINS(T3.glossary_guid, T2.non_core_anchor_guid)
ORDER BY
    similarity_score DESC, potential_duplicate_glossary_name, source_of_truth_name;

This query returns:

Glossary terms from non-core glossaries that are similar to terms in the core glossary
Similarity scores to help prioritize which terms to review (100 = exact match)
Glossary names and descriptions for both source of truth and potential duplicates

info

Customization:

Replace 'core-glossary-guid' with the actual GUID of your core business glossary
Adjust the similarity threshold (currently set to >= 95) if needed. A similarity score of 100 = exact match

Reference

Count queries

Use these queries to get the total number of potentially duplicate metrics or glossary terms.

Number of potentially duplicate Tableau calculated fields

WITH POTENTIAL_DUPLICATES AS
(
SELECT
  t1.NAME AS "Calculated Field 1 Name",
  t1.GUID AS "Calculated Field 1 GUID",
  t1.WORKBOOKQUALIFIEDNAME AS "Calculated Field 1 Workbook",
  t1.FORMULA AS "Calculated Field 1 Formula",
  t2.NAME AS "Calculated Field 2 Name",
  t2.GUID AS "Calculated Field 2 GUID",
  t2.WORKBOOKQUALIFIEDNAME AS "Calculated Field 2 Workbook",
  t2.FORMULA AS "Calculated Field 2 Formula"
FROM
  TABLEAUCALCULATEDFIELD_ENTITY AS t1
JOIN
  TABLEAUCALCULATEDFIELD_ENTITY AS t2
ON
  (LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%') OR t1.FORMULA = t2.FORMULA)
WHERE
  t1.GUID < t2.GUID
ORDER BY
  "Calculated Field 1 Name"
)
SELECT COUNT(DISTINCT "Calculated Field 1 GUID") FROM POTENTIAL_DUPLICATES;

Number of potentially duplicate Power BI measures

WITH POTENTIAL_DUPLICATES AS
(
SELECT
  t1.NAME "MEASURE 1 NAME",
  t1.GUID "MEASURE 1 GUID",
  t1.POWERBIMEASUREEXPRESSION "MEASURE 1 EXPRESSION",
  t2.NAME "MEASURE 2 NAME",
  t2.GUID "MEASURE 2 GUID",
  t2.POWERBIMEASUREEXPRESSION "MEASURE 2 EXPRESSION"
FROM
  POWERBIMEASURE_ENTITY AS t1
JOIN
  POWERBIMEASURE_ENTITY AS t2
ON
  LOWER(TRIM(t1.NAME)) LIKE CONCAT(LOWER(TRIM(t2.NAME)), '%')
WHERE
  t1.GUID < t2.GUID
ORDER BY
  "MEASURE 1 NAME"
)
SELECT COUNT(DISTINCT "MEASURE 1 GUID") FROM POTENTIAL_DUPLICATES;

Column reference

Column	Description
`NAME`	Metric or glossary term name
`GUID`	Unique identifier for the metric or term
`FORMULA` (Tableau)	Calculated field formula
`POWERBIMEASUREEXPRESSION` (Power BI)	Measure expression
`WORKBOOKQUALIFIEDNAME` (Tableau)	Fully qualified workbook name
`USERDESCRIPTION` (Glossary)	Term description
`ANCHOR` (Glossary)	Array containing glossary GUIDs the term belongs to
`JAROWINKLER_SIMILARITY`	Similarity score function (100 = exact match)

Before you begin​

Identify potentially duplicate Tableau calculated fields​

Identify potentially duplicate Power BI measures​

Identify potentially duplicate glossary terms​

Reference​

Count queries​

Number of potentially duplicate Tableau calculated fields​

Number of potentially duplicate Power BI measures​

Column reference​

Before you begin

Identify potentially duplicate Tableau calculated fields

Identify potentially duplicate Power BI measures

Identify potentially duplicate glossary terms

Reference

Count queries

Number of potentially duplicate Tableau calculated fields

Number of potentially duplicate Power BI measures

Column reference