Skip to main content

Improve Atlan MCP search quality

Connect docs via MCP

Atlan MCP returns more relevant results when your catalog metadata is rich and trustworthy. MCP searches the metadata in your catalog, scores candidate assets on semantic relevance, trust signals, and usage, then returns the highest-ranked matches. Enriching that metadata is the single biggest lever you control over search quality.

This page explains how MCP ranks results, which metadata signals move the needle, and how to enrich your catalog at scale.

The semantic search engine behind Atlan MCP is the same one that powers conversational AI in Atlan's UI. Everything on this page improves results in both places, and in the in-product search experience as well.

How Atlan MCP ranks search results

When several assets can answer the same question, MCP doesn't pick one at random. It scores candidates on three factors and returns the highest-ranked matches:

  1. Semantic relevance: how closely the meaning of an asset's name, description, README, and linked glossary terms matches the intent of the question. Semantic search handles synonyms, abbreviations, typos, and multi-word names, so customer churn can match a table described as "monthly subscriber attrition." If the matching language isn't anywhere in the asset's metadata, MCP can't match the asset well. This is why descriptions and READMEs matter so much.
  2. Trust signals: enrichment that marks an asset as governed and reliable. When two assets match a query equally well on meaning, trust signals decide which one wins. See the five trust signals below.
  3. Usage and popularity: how frequently an asset is queried or accessed (available for sources such as Snowflake and Google BigQuery). A heavily used table is more likely to be the one people actually want.

The practical takeaway: enrichment is the single biggest lever you control. Semantic relevance gives MCP the language to find an asset; trust signals and usage tell it which matching asset to prefer.

Five trust signals to prioritize

What they're: Trust signals are pieces of metadata that mark an asset as governed, maintained, and reliable, such as a certificate, an owner, or a description. Each one is a deliberate act of curation that a human added on purpose.

Why they matter: When two assets match a question equally well on meaning, trust signals decide which one MCP returns first. Atlan treats curated assets as more trustworthy, so these signals lift trusted assets above unowned, undocumented, or near-duplicate matches. The five listed here have the largest effect, so prioritize them on your most important assets first.

#Trust signalWhy it improves relevanceHow to add it
1CertificationA Verified certificate marks an asset as trusted and lifts it higher than uncertified matches. Deprecated is equally valuable: it steers MCP and users away from retired assets so the current asset wins. Both states signal active stewardship.Add certificates or set the certificate from chat (see prompts below).
2DescriptionThe primary text MCP matches a question to. A clear, business-meaningful description widens the range of questions an asset can answer.Add descriptions manually, with Context Agents, or via MCP.
3OwnershipAn assigned owner (individual, group, or team) signals the asset is maintained and accountable. Owned assets rank higher than orphaned ones.Add owners, users or groups, to the asset.
4Linked assetsGlossary terms, related assets, and lineage connect an asset to business context. Linked glossary terms in particular expand the vocabulary MCP can match to.Link glossary terms and keep lineage healthy.
5READMELong-form context (examples, sample queries, caveats, definitions, and grain) that a one-line description can't hold. READMEs are rich matching surface and the content MCP returns to explain an asset.Add a README to the asset, manually or with AI assistance.

You don't need to enrich everything at once. Focus these signals on your highest-value assets first: certified golden datasets, the tables behind key dashboards, and the terms your business asks about most.

Enrich other high-value metadata

In addition to the five trust signals, these fields add business context and precision, and they cover what customers most often ask about (custom fields, classifications, tags, and business context). Semantic search can only match language that exists in your metadata, so spell out the synonyms, abbreviations, and everyday terms your team actually uses. If people call it "the rev table," make sure those words live somewhere in the asset's metadata.

Metadata to addWhat it does for MCP search
Classifications and tags (for example, PII, Confidential, domain or quality tags)Lets MCP filter by sensitivity and governance, and adds searchable business labels.
Custom metadata (for example, Governance Status, Data Quality Score, System of Record)Captures structured business and governance context that standard fields don't cover, all of it queryable.
Domains and data productsGroups assets into business context (for example, Finance, Marketing) so MCP can scope results to the right area.
AliasesAlternate names your team uses for an asset, so search matches the words people actually type.
AnnouncementsFlag deprecations and incidents inline, steering users to the right replacement asset.

Ways to enrich your catalog

These paths are complementary. Start with the most scalable option, then reserve hands-on curation for your highest-stakes assets.

Start with context agents

What to do: Use context agents to enrich metadata at volume across your most important assets.

Why it matters: It's the most scalable enrichment path and the best place to start. The agents enrich the same metadata MCP searches over, so every accepted suggestion improves MCP relevance downstream.

How to implement:

  1. Open Context Agents Studio.
  2. Let the agents read evidence from your Enterprise Data Graph: lineage, SQL patterns, usage signals, and existing business definitions.
  3. Review and approve the generated context: descriptions, READMEs, glossary term linkages, and SQL intelligence.

Enrich in bulk with MCP

What to do: Use MCP to find under-enriched assets and fix them in the same prompt, directly from your AI client.

Why it matters: Each prompt is a compound action that finds the gap and enriches it in one pass, with no switching to the UI. Prioritizing by popularity and incompleteness fixes the assets that hurt search quality the most.

How to implement: Prioritize by popularity × incompleteness, since the high-traffic, under-enriched assets hurt search quality the most. These are starting points, so adapt the schema names, owners, and terms to your catalog. For more, see the full metadata enrichment workflows.

Examples:

Example 1: Fill the biggest gaps in a high-traffic schema.

Find the 20 most-queried tables in the Snowflake ANALYTICS schema that are missing a description, an owner, or certification. For each one, draft a clear business description from its columns and lineage, assign the schema's data steward as owner, and mark it Verified once it has both.

Example 2: Propagate context from a trusted source down its lineage.

Starting from the certified CUSTOMERS table, walk its downstream lineage and copy its description, linked glossary terms, and PII tags onto any downstream table or dashboard that is missing them.

Example 3: Standardize a contested metric into one glossary term.

Find every asset that references 'active user' or 'MAU', draft one canonical glossary term from how those assets define it, create the term, and link all of them to it.

Manually document critical assets

What to do: Curate descriptions, READMEs, owners, certification, and glossary links by hand for your highest-value assets.

Why it matters: For assets where accuracy is non-negotiable (certified golden datasets, regulatory tables, and the assets behind executive dashboards), hands-on curation gives you the most control. It's the least scalable path, so reserve it for assets that truly warrant the attention.

See also: add descriptions.

Next steps