Skip to main content

Run simulations Private Preview

Connect docs via MCP

A simulation runs an autogenerated question set on your context repository and reports which questions pass, which fail, and what's missing. Each failure points to a specific fix: a missing description, a misrouted join, a missing filter, an out-of-scope asset, or a missing synonym. You describe the fix in the chat window, the agent applies it, and you re-run until results are stable enough to confirm the repository is accurate and ready to deploy.

Your repository also includes a quality-report.md with per-dimension eval scores and recommended fixes. Review it alongside simulation results to get a complete picture of what needs attention before certifying.

Prerequisites

Before you begin, make sure:

Run simulations

  1. In Context Engineering Studio, open your context repository and click the Simulate tab.

  2. Click Pick a collection, then select a collection from the list. Collections are autogenerated based on your repository's assets, metrics, and domain scope. Each collection contains representative questions that test your semantic model's coverage and accuracy.

  3. Once selected, the simulation begins. This might take a few minutes. When complete, you see:

    • Overall Score: Percentage of questions answered correctly and number passing.
    • Test Results: Individual results for each question, including accuracy score, latency, and the persona that question represents.
    • View SQL: Generated SQL for each question to see what the model produced.
  4. Read the per-question diagnostics, not just the aggregate score. Each failing question tells you something specific about what the model is missing.

  5. Use the chat window to fix failing questions. Describe what needs to change and the agent updates the repository for you, refining definitions, metrics, filters, joins, and relationships without manual YAML edits.

  6. Re-run the simulation and review the results. Check whether the questions you targeted now pass, and whether any previously passing questions regressed. Adjust and re-run until the results are stable.

    Databricks simulations

    On Databricks, simulations run sequentially because of Genie API rate limits (5 queries per minute per workspace by default). Expect longer run times than on Snowflake for the same question set. Databricks can raise this quota on request.

Once simulations consistently surface only out-of-scope questions and your domain expert is satisfied with the results, your context repository is ready to deploy.

How to read failures

Each failing question maps to a specific kind of model gap. Use the failure shape to know what to fix:

Paraphrases produce different answers A synonym is missing. Add the alternate phrasing to the relevant column or metric.

Agent chose an adjacent table The preferred asset isn't flagged strongly enough. Boost the canonical asset's description, or downrank the alternate.

Same metric, different numbers under different filters Two competing definitions exist in the model. Reconcile to a single definition or scope each filter explicitly.

Question can't be answered at all An asset or relationship is out of scope. Add the asset to the repository, or accept it as out of scope.

Next steps