Skip to main content

Detect duplicate assets
App

You can detect assets that may be duplicates of one another using the Duplicate Detector app, which compares their column sets to identify matches.

The workflow scans assets within a defined scope, groups those that share identical columns, regardless of case or order, and records each group as a glossary term linked to all matching assets. This guide covers how to configure the workflow, including the workflow name, glossary, qualified name prefix, and additional options, and how to review and act on the results in the glossary.

This guide explains how to configure the workflow (including workflow name, glossary, qualified name prefix, and options) and how to review and act on the results in the glossary.

Prerequisites

Before you begin, make sure you have:

  • Access to the Duplicate detector app in your Atlan workspace. If you don't see it when creating a new workflow, contact your Atlan administrator or Atlan support.
  • At least one connection with assets already crawled into Atlan so that there are tables or views to evaluate for potential duplicates.
  • A clear scope for the scan (for example, the connection, database, schema, or catalog you want to include).

Configure workflow

  1. In your Atlan workspace, go to the homepage and click New workflow in the top navigation bar.

  2. Search for Duplicate detector, and then select Set up workflow.

  3. In the Workflow name field, enter a clear and descriptive name that reflects the scope of the duplicate scan.
    This name appears in the workflow list and helps distinguish different configurations or environments.

    duplicate-detector-sales-warehouse
  4. In the Glossary name field, enter a name for the glossary where the duplicate sets of assets are recorded and tracked. If the glossary doesn't yet exist, the workflow creates it. If it already exists, the workflow updates it with the new results.

  5. In Qualified name prefix, provide the starting value of the qualifiedName for the assets you want to scan. The workflow only evaluates assets whose qualified names begin with this prefix, so you can control the scope of the scan (for example, a specific connection, database, schema, or domain).

    For example, if you want to scan all tables and views under a Snowflake reporting schema, use a prefix similar to:

    default/snowflake/.../REPORTING_DB/REPORTING_SCHEMA
  6. In Options, select Default to run the workflow with the recommended settings for duplicate detection. If you need environment-specific behavior or custom settings provided by Atlan, select Advanced.

  7. Schedule and run the workflow. Run it once to identify potential duplicates in your selected scope, or set a recurring schedule to continuously detect new duplicates as assets are added or updated.

The Duplicate detector app scans the assets that match your configured qualifiedName prefix and asset types, groups assets with identical column sets, and updates the glossary you specified with the results.

Review duplicate groups

  1. After you configure and run the workflow, Duplicate detector writes the results to the glossary you selected in Glossary name (by default, Duplicate assets).

    Each detected duplicate group is created as a glossary term and linked to the matching assets, so you can review duplicates from one place.

  2. In Atlan, go to Glossary and open the glossary you configured (for example, Duplicate assets).

  3. Search for terms named like Dup. (00000000) and open a term to see the list of linked assets in that duplicate group. Use the linked assets to compare duplicates and decide next steps (for example, consolidate, retire, or document differences).

Need help?

If you encounter issues when configuring or running the workflow, or if you need help interpreting the results, contact Atlan support.

See also

  • Search and discover assets: Learn how to search for and filter assets, including those linked to duplicate-detection glossary terms.
  • Link terms to assets: Understand how glossary terms appear on asset profiles and how they help provide business context.