Skip to main content

Context repositories Private Preview

A context repository is the core unit of work in Context Engineering Studio. It's a bounded, versioned package of semantic context for a specific business domain: the tables and columns in scope, the metric definitions, the business logic, the verified question-answer pairs, and the deployment artifacts that AI agents use to answer questions accurately.

Every context repository is scoped to a domain, owned by a team, and certified before it reaches users. This model keeps AI accuracy high and keeps accountability clear.

  • Scoped, not monolithic: each repository covers one domain. Narrow scope means higher accuracy, faster iteration, and a clear owner for every definition.
  • Versioned and certified: every deployment is a snapshot. You can change the model freely in draft: nothing reaches users until a domain expert certifies the new version.
  • Engine-agnostic: a single repository deploys to Snowflake Cortex Analyst, Databricks Genie, or both. Change a metric definition once and every deployed target reflects it after the next certification.

Components

A context repository is made up of four components that move through the lifecycle together.

Semantic model The YAML definition of your business logic for this domain: which tables are in scope, how they join, how metrics are calculated, which filters apply by default, and which synonyms map to each business term. This is what CES compiles into a Snowflake Semantic View or Databricks Metric View on deploy.

Question set and simulations A set of natural-language business questions paired with verified SQL written by a domain expert. You run the question set through a simulation to see where the semantic model answers well, where it misses, and what specific change is needed to close the gap. After deployment it becomes a permanent regression guard, re-run every time the model changes.

Deployment artifacts The compiled output for each target engine. For Snowflake: a CREATE OR REPLACE SEMANTIC VIEW DDL statement. For Databricks: a CREATE OR REPLACE VIEW ... WITH METRICS LANGUAGE YAML AS $$...$$ DDL per table, following Databricks's Metric View syntax. CES generates and executes these automatically on deploy.

Certification history A timestamped log of every certification: who certified, when, the certification note, and the accuracy score at the time. This gives you a full audit trail of what was deployed and by whom.

Lifecycle

Repositories move through three stages. The stage controls what you can do with the repository and whether it's serving AI agents.

StageWhat it meansWhat you can do
DraftBeing built, refined, and tested. Not deployed.Edit the semantic model, add question set entries, run simulations.
ActiveCertified and deployed to at least one target engine.Monitor in Observe (currently available for Snowflake Cortex deployments), promote traces to the question set, start a new draft.
ArchivedRetired and no longer served to agents.View history. Can't be redeployed without creating a new repository.

The transition from Draft to Active requires certification. The transition from Active to Archived is manual: you archive a repository when a newer one replaces it or when the domain is no longer in scope.

Domain scoping

The most common mistake when building context repositories is making the scope too broad. A single repository that covers all company data produces poor accuracy: too many definitions collide, too many tables compete for the same question.

A well-scoped repository:

  • Covers one reporting area or business domain: Sales pipeline, Finance reporting, Marketing attribution.
  • Contains 3 to 10 core tables. More is possible but accuracy degrades as the model gets wider.
  • Corresponds to an existing dashboard or report: the business already knows what questions it answers.
  • Has a clear owner: one team or one domain expert who can confirm whether an AI answer is correct.

Signs a repository is too broad:

  • It contains tables from multiple unrelated business functions.
  • No single person can confirm whether all question set answers are correct.
  • Simulations keep surfacing conflicting definitions for the same term no matter how many fixes you apply, usually a signal of competing business logic that belongs in separate repositories.

Signs a repository is too narrow:

  • Questions that span two related tables (for example, opportunities joined to accounts) can't be answered.
  • You're creating one repository per table.

The right pattern for most teams: start with one high-traffic domain, get it answering the business's real questions well, then expand to adjacent domains as separate repositories.

Versioning and certification

Certification is the gate between "work in progress" and "live for users." When you certify a repository, CES locks the semantic model to a versioned snapshot. That snapshot is what gets compiled and deployed to your target engine.

Any change to an active repository (adding a metric, fixing a join, updating a description) creates a new draft. The active snapshot keeps serving users while you work on the new version. The new version only goes live after you re-certify and re-deploy.

This means:

  • Users never see an in-progress model.
  • Every deployed version has a certification record with contributor, timestamp, and accuracy score.
  • You can roll back by deploying a previous certification snapshot if needed.

Deploying to multiple targets

A single context repository can deploy to more than one engine. After certifying, you can deploy to Snowflake Cortex Analyst and Databricks Genie independently from the same YAML definition. CES handles the format conversion automatically.

This matters for teams that run both Snowflake and Databricks workloads: one set of metric definitions, one question set, one certification process, and consistent answers across both AI surfaces.

Relationships between repositories

Repositories are independently versioned and independently deployed. Changing one doesn't affect others. But the business terms you use across repositories matter: if "revenue" means something different in the Sales repository than in the Finance repository, users get different answers from different AI surfaces for what feels like the same question.

The Atlan glossary is the shared source of truth. Business terms defined in the glossary and linked to columns in a context repository carry consistent meaning across every repository that uses them. When Context Agents Studio enriches a new repository, it maps columns to the same glossary terms: reducing definition drift across domains automatically.

See also