Skip to main content

Run simulations Private Preview

This guide walks you through running simulations on your semantic model, reading the results, and applying fixes. Each simulation run surfaces where the model answers well and where it needs more context, including descriptions, joins, filters, or additional assets.

Prerequisites

Before you begin, make sure:

Run simulation

  1. In Context Engineering Studio, open your context repository and click the Simulate tab.

  2. Review your question set on the tab. If no questions are loaded yet, set them up first. See Create your question set.

  3. Click Run evaluation. CES sends every question through the engine, captures the SQL the engine generates, and compares the generated result set to your verified SQL's result set.

    On Databricks, simulations run sequentially due to Genie API rate limits (5 queries per minute per workspace by default). Expect longer run times than on Snowflake for the same question set. Databricks can raise this quota on request.

  4. Read the per-question diagnostics, not just the aggregate score. Each failing question tells you something specific about what the model is missing.

  5. Resolve conflicts that surface. If the same metric name produces different results depending on the asset path, specify a canonical formula in the YAML. If a term maps to multiple columns, tighten the description or synonym list to remove the ambiguity. If a question can't be answered, add the missing asset to the repository or mark it as out of scope.

  6. Fix the failures affecting the most questions first. Use Chat & build to describe the change in plain English, apply a structured suggestion from the fix agent, or edit the YAML directly. For example:

    The revenue metric is returning results that include refunded transactions. Exclude rows where transaction_status is 'refunded'.

    For a full list of fix actions and diagnostic types, see Simulation diagnostics.

  7. Re-run the simulation and review the results. Check whether the questions you targeted now pass, and whether any previously passing questions regressed. For example, if five questions about monthly revenue now pass but a question about quarterly revenue regressed, the filter added in the previous step may be scoped too narrowly. Adjust and re-run until the results are stable.

Once simulations consistently surface only out-of-scope questions and your domain expert is satisfied with the results, your context repository is ready to deploy.

Next steps