Context repository YAML schema Private Preview

Context Engineering Studio generates different YAML formats depending on your target engine. Select your engine below.

Snowflake Cortex Analyst
Databricks Metric View

A complete Snowflake Cortex Analyst semantic model with all major sections. Inline comments explain each field.

name: SALES_PIPELINE                         # repo identifier, UPPER_SNAKE_CASE

tables:
  - name: OPPORTUNITIES                      # logical alias used in exprs
    description: "Active and closed sales opportunities tracked in Salesforce."
    base_table:
      database: PROD_DB
      schema: SALES
      table: OPPORTUNITIES
    dimensions:
      - name: STAGE_NAME
        expr: STAGE_NAME
        data_type: VARCHAR
        description: "Current stage of the opportunity in the sales cycle."
        sample_values: ["Prospecting", "Negotiation", "Closed Won"]
        synonyms: ["stage", "pipeline stage", "deal stage"]
    time_dimensions:
      - name: CLOSE_DATE
        expr: CLOSE_DATE
        data_type: DATE
        description: "Date the opportunity is expected to close or was closed."
        synonyms: ["close date", "expected close", "deal close"]
    metrics:
      - name: TOTAL_ARR
        expr: "SUM(ANNUAL_RECURRING_REVENUE)"
        description: "Sum of annual recurring revenue across all opportunities."
        access_modifier: public_access
    filters:
      - name: OPEN_OPPORTUNITIES
        synonyms: ["active", "open deals", "in-flight"]
        description: "Filters to opportunities that have not yet been closed."
        expr: "STAGE_NAME NOT IN ('Closed Won', 'Closed Lost')"

module_custom_instructions:
  sql_generation: "ALWAYS filter by CLOSE_DATE when questions mention a time period."
  question_categorization: "Questions about pipeline health refer to open opportunities only."

Descriptions are required for accuracy (Snowflake)

Per Snowflake's Cortex Analyst best practices, high-quality descriptions aren't optional, they're required. The YAML schema marks dimension and column descriptions as optional for syntactic reasons, but deploying without them measurably hurts answer quality. Write descriptions that explain proprietary terms and abbreviations explicitly; don't assume shared knowledge.

Top-level fields

Field	Type	Required	Description
`name`	string	Yes	Identifier for the semantic model. Use `UPPER_SNAKE_CASE`. Must be unique within the deployment target.
`tables`	array	Yes	One or more table definitions. See Tables. Start with 5–10 tables for a POC, scale to more focused semantic views rather than one wide view.
`module_custom_instructions`	object	No	Free-text SQL generation hints passed to Cortex Analyst at query time. See Custom instructions.

`tables[]`

Each entry in tables defines one logical table, the mapping between a physical Snowflake table and the business concepts exposed from it.

Field	Type	Required	Description
`name`	string	Yes	Logical alias for this table. Referenced by `expr` fields in dimensions and metrics. `UPPER_SNAKE_CASE`.
`description`	string	Yes	Business-friendly description of what this table represents. Present tense. 200 characters maximum.
`base_table`	object	Yes	Fully qualified physical table. See `base_table`.
`dimensions`	array	No	Categorical or text attributes. See `dimensions[]`.
`time_dimensions`	array	No	Date or timestamp columns for time-series queries. See `time_dimensions[]`.
`metrics`	array	No	Aggregate measures. See `metrics[]`.
`filters`	array	No	Named reusable WHERE predicates. See `filters[]`.

`base_table`

Field	Type	Required	Description
`database`	string	Yes	Snowflake database name.
`schema`	string	Yes	Schema within the database.
`table`	string	Yes	Physical table or view name.

`dimensions[]`

Dimensions are categorical, text, or numeric attributes used to group and filter results.

Field	Type	Required	Description
`name`	string	Yes	Logical name. `UPPER_SNAKE_CASE`. Must be unique within the table.
`expr`	string	Yes	Column reference in the physical table. Use the exact column name as it appears in Snowflake.
`data_type`	string	No	Snowflake data type. Common values: `VARCHAR`, `NUMBER`, `BOOLEAN`, `DATE`, `TIMESTAMP_TZ`.
`description`	string	No	Business-friendly description. Present tense. 200 characters maximum.
`sample_values`	array	No	Representative string values. Three or more recommended. Helps Cortex Analyst understand the domain of this column.
`synonyms`	array	No	Lowercase alternative names users might say. Use sparingly, see note below. Don't leave this as an empty array, omit the field entirely if you have no synonyms.

tip

sample_values are especially useful for low-cardinality columns like status fields, region codes, and product categories. They anchor Cortex Analyst's filter generation to real data values.

Snowflake synonym guidance

Per Snowflake's Cortex Analyst best practices, avoid synonyms unless the term is unique or industry-specific. Generic synonyms consume tokens without meaningful accuracy improvement. Prefer a sharper description to disambiguate terms when possible.

`time_dimensions[]`

Time dimensions are date or timestamp columns intended for time-series filtering and aggregation.

Field	Type	Required	Description
`name`	string	Yes	Logical name. `UPPER_SNAKE_CASE`. Must be unique within the table.
`expr`	string	Yes	Column reference. Use the exact column name.
`data_type`	string	Yes	Must be `DATE` or `TIMESTAMP_TZ(9)`.
`description`	string	No	Business-friendly description. Present tense. 200 characters maximum.
`synonyms`	array	No	Lowercase alternative names. Two to five terms. Omit rather than leaving empty.

note

Cortex Analyst uses the data_type of time dimensions to decide how to apply date truncation and range filters. Use DATE for calendar date columns. Use TIMESTAMP_TZ(9) for event timestamps with timezone.

`metrics[]`

Metrics are aggregate expressions, the computed measures your business cares about.

Field	Type	Required	Description
`name`	string	Yes	Logical name. `UPPER_SNAKE_CASE`. Must be unique within the table.
`expr`	string	Yes	Aggregation expression. Must start with an aggregation function: `SUM`, `AVG`, `COUNT`, `MIN`, or `MAX`.
`description`	string	No	Business-friendly description of what this metric measures. Present tense. 200 characters maximum.
`access_modifier`	string	No	Controls query visibility. Valid values: `public_access` (end users can query), `private_access` (internal only, excluded from Cortex Analyst results). Defaults to `private_access` if omitted.

Metric expression rules

expr must start with an aggregation function. Bare column references aren't valid metrics.
Nested aggregations (for example, AVG(SUM(col))) aren't supported, use a single aggregation.
For COUNT(DISTINCT col) patterns, test carefully, Cortex Analyst support varies by account.

Valid expressions:

expr: "SUM(ANNUAL_RECURRING_REVENUE)"
expr: "COUNT(DISTINCT OPPORTUNITY_ID)"
expr: "AVG(DEAL_SIZE)"

Invalid expressions:

expr: "ANNUAL_RECURRING_REVENUE"          # missing aggregation
expr: "AVG(SUM(DEAL_VALUE))"              # nested aggregate

`filters[]`

Filters are named, reusable WHERE predicates that let users ask filtered questions naturally.

Field	Type	Required	Description
`name`	string	Yes	Logical name. `UPPER_SNAKE_CASE`. Must be unique within the table.
`synonyms`	array	No	Lowercase alternative terms. Two to five terms. Omit rather than leaving empty.
`description`	string	No	What this filter does, in plain language. Present tense. 200 characters maximum.
`expr`	string	Yes	A boolean SQL WHERE predicate. Must evaluate to `TRUE`/`FALSE`. Never use an aggregation function here.

Valid expressions:

expr: "STAGE_NAME NOT IN ('Closed Won', 'Closed Lost')"
expr: "REGION = 'EMEA'"
expr: "IS_ACTIVE = TRUE"
expr: "CLOSE_DATE >= DATEADD(year, -1, CURRENT_DATE())"

Invalid expressions:

expr: "SUM(REVENUE) > 1000000"    # aggregation in a filter, use a metric instead
expr: "COUNT(ID)"                  # aggregation, not a predicate

`relationships[]`

Relationships define how tables in the semantic model join to each other. Cortex Analyst uses these when a query spans multiple tables.

Field	Type	Required	Description
`name`	string	Yes	Logical name for this relationship. `UPPER_SNAKE_CASE`.
`left_table`	string	Yes	Logical alias of the left-hand table. Must match a `name` in `tables[]`.
`right_table`	string	Yes	Logical alias of the right-hand table. Must match a `name` in `tables[]`.
`join_type`	string	No	`LEFT`, `INNER`, or `FULL`. Defaults to `LEFT` if omitted.
`relationship_columns`	array	Yes	One or more column pairs defining the join condition. Each pair has `left_column` and `right_column`.

relationships:
  - name: OPPORTUNITIES_TO_ACCOUNTS
    left_table: OPPORTUNITIES
    right_table: ACCOUNTS
    join_type: LEFT
    relationship_columns:
      - left_column: ACCOUNT_ID
        right_column: ACCOUNT_ID

note

CES generates relationships entries automatically from lineage and query history when you build a multi-table context repository. Review Context Agents Studio's join recommendations in the Thread tab before deploying.

Many-to-many (Snowflake)

Cortex Analyst semantic views don't directly support many-to-many relationships. If your domain has one, model it with a shared dimension (bridge) table and represent the relationship as two MANY_TO_ONE joins through the bridge. See Snowflake's best practices.

`module_custom_instructions`

Optional free-text instructions that Cortex Analyst receives alongside the semantic model when generating SQL.

Field	Type	Description
`sql_generation`	string	Instructions for how Cortex generates SQL, date filter rules, preferred join paths, columns to avoid.
`question_categorization`	string	Instructions for how Cortex interprets question intent, disambiguating terms that map to multiple tables.

tip

Keep custom instructions focused. Long or contradictory instructions reduce accuracy. One rule per instruction, tested with your eval suite.

Naming conventions

Element	Convention	Example
Model `name`	`UPPER_SNAKE_CASE`	`SALES_PIPELINE`
Table `name` (alias)	`UPPER_SNAKE_CASE`	`OPPORTUNITIES`
Dimension / time dimension `name`	`UPPER_SNAKE_CASE`	`STAGE_NAME`
Metric `name`	`UPPER_SNAKE_CASE`	`TOTAL_ARR`
Filter `name`	`UPPER_SNAKE_CASE`	`OPEN_OPPORTUNITIES`
`synonyms` entries	lowercase	`"deal stage"`, `"pipeline stage"`
Descriptions	Present tense, plain language	`"Sum of annual recurring revenue."`

Constraints summary

Rule	Applies to	Detail
Name casing	All `name` fields	`UPPER_SNAKE_CASE` only.
Metric `expr` starts with aggregate	`metrics[].expr`	First token must be `SUM`, `AVG`, `COUNT`, `MIN`, or `MAX`.
Filter `expr` is boolean	`filters[].expr`	Must evaluate to `TRUE`/`FALSE`. No aggregation functions.
`synonyms` not empty	`dimensions`, `time_dimensions`, `filters`	Provide 2–5 lowercase terms or omit the field entirely. An empty array `[]` causes validation errors.
Synonym entry length	`synonyms` entries	Each synonym truncated at 128 characters on deploy.
Description length	All `description` fields	200 characters maximum. Longer values are truncated on deploy.
No nested aggregates	`metrics[].expr`	Patterns like `AVG(SUM(x))` are invalid.

Validation and autofix

Issue	Autofix	Manual action required
`synonyms` array is empty	Yes, field removed	-
Synonym entry exceeds 128 characters	Yes, truncated at 128 characters	Review truncated synonym for meaning
Descriptions exceed 200 characters	Yes, truncated at word boundary	Review truncated text
Lowercase metric or dimension names	Yes, converted to `UPPER_SNAKE_CASE`	-
Nested aggregate in metric `expr`	Yes, flattened to innermost aggregate	Verify the simplified expr is correct
Boolean column compared with `= 1`	Yes, rewritten to `= TRUE`	-
Column name with backticks or quotes	Yes, quotes stripped	-

A complete Databricks Metric View v1.1 definition with all major sections. Inline comments explain each field.

version: "1.1"                              # required, always "1.1"
source: prod_catalog.sales.opportunities    # fully qualified: catalog.schema.table
comment: "Active and closed sales opportunities tracked in Salesforce."

filter: "IS_DELETED = FALSE"                # optional global WHERE predicate

dimensions:
  - name: stage_name
    expr: STAGE_NAME
    comment: "Current stage of the opportunity in the sales cycle."
    display_name: "Stage"
    synonyms: ["stage", "pipeline stage", "deal stage"]
    format: "text"

measures:
  - name: total_arr
    expr: "SUM(ANNUAL_RECURRING_REVENUE)"   # must contain an aggregate function
    comment: "Sum of annual recurring revenue across all opportunities."
    window: false

joins:
  - name: accounts
    source: prod_catalog.sales.accounts     # fully qualified table
    on: "opportunities.ACCOUNT_ID = accounts.ACCOUNT_ID"

Top-level fields

Field	Type	Required	Description
`version`	string	Yes	Always `"1.1"`.
`source`	string or object	Yes	Fully qualified primary table. Accepts shorthand `catalog.schema.table` or structured form with `time_column`. See below.
`comment`	string	No	Business-friendly description of the metric view. Equivalent to Snowflake's `description`.
`filter`	string	No	Global SQL WHERE predicate applied to all queries on this view. See Filter rules.
`dimensions`	array	No	Categorical attributes. See `dimensions[]`.
`measures`	array	No	Aggregate measures. See `measures[]`.
`joins`	array	No	Additional tables joined to the primary source. See `joins[]`.

`source` structured form

When you need to specify a default time column for time-series queries, use the structured form instead of the dotted string:

source:
  catalog: prod_catalog
  schema: sales
  table: opportunities
  time_column: CLOSE_DATE     # default column for time-based filtering

The time_column value must be the exact column name in the source table. Genie uses it when a question mentions a time period but doesn't name a specific date column.

DDL wrapping

CES creates each Metric View with Databricks's official WITH METRICS LANGUAGE YAML syntax, documented in Databricks Metric Views:

CREATE OR REPLACE VIEW <catalog>.<schema>.<view_name>
  WITH METRICS LANGUAGE YAML
  AS $$
version: 1.1
source: prod_catalog.sales.opportunities
measures:
  - name: total_arr
    expr: "SUM(ANNUAL_RECURRING_REVENUE)"
    comment: "Total annual recurring revenue."
$$;

WITH METRICS LANGUAGE YAML signals to the Databricks runtime that the body is a Metric View definition, not a standard view. The view appears in Unity Catalog and is recognized as an AI/BI Metric View when referenced by a Genie Space.

The YAML body sits inside a $$ ... $$ dollar-quoted block. A literal $$ inside the YAML terminates the delimiter early and breaks the DDL, which is why the filter field blocks $$.

`dimensions[]`

Field	Type	Required	Description
`name`	string	Yes	Logical name for this dimension.
`expr`	string	Yes	Column reference in the source table. Use the exact column name.
`comment`	string	No	Business-friendly description. Equivalent to Snowflake's `description`.
`display_name`	string	No	Human-readable label shown in Genie Space.
`synonyms`	array	No	Lowercase alternative names. Two to five terms. Omit rather than leaving empty.
`format`	string	No	Display format hint. Common values: `"text"`, `"number"`, `"date"`, `"currency"`.

note

Unlike Snowflake, Databricks Metric View doesn't support data_type or sample_values on dimensions. These fields are dropped automatically when CES converts a Snowflake YAML to Databricks format.

`measures[]`

Measures are aggregate expressions, equivalent to metrics in the Snowflake format.

Field	Type	Required	Description
`name`	string	Yes	Logical name for this measure.
`expr`	string	Yes	Aggregation expression. Must contain an aggregation function: `SUM`, `AVG`, `COUNT`, `MIN`, or `MAX`.
`comment`	string	No	Business-friendly description of what this measure calculates.
`window`	boolean	No	Set to `true` if this measure uses a window function. Defaults to `false`.

Measure expression rules

expr must include an aggregation. Bare column references aren't valid measures.
Nested aggregations (for example, AVG(SUM(col))) are invalid and are flattened automatically to the innermost aggregate during validation.
Don't reference another measure's name inside an expr: use the underlying column directly.

Valid expressions:

expr: "SUM(ANNUAL_RECURRING_REVENUE)"
expr: "COUNT(DISTINCT OPPORTUNITY_ID)"
expr: "AVG(DEAL_SIZE)"

Invalid expressions:

expr: "ANNUAL_RECURRING_REVENUE"     # missing aggregation
expr: "AVG(SUM(DEAL_VALUE))"         # nested aggregate, flattened to SUM(DEAL_VALUE)
expr: "SUM(total_arr)"               # references another measure name, not a column

`joins[]`

Joins connect additional tables to the primary source for multi-table queries.

Field	Type	Required	Description
`name`	string	Yes	Alias used to reference this joined table in expressions.
`source`	string	Yes	Fully qualified table: `catalog.schema.table`.
`on`	string	No†	SQL join condition. For example: `"opportunities.ACCOUNT_ID = accounts.ACCOUNT_ID"`.
`using`	string	No†	Column name used for an equi-join when the column name is identical in both tables.

†Provide either on or using, not both.

Filter rules

The top-level filter field applies a global WHERE predicate to every query on the metric view.

Valid filter expressions:

filter: "IS_DELETED = FALSE"
filter: "REGION = 'EMEA'"
filter: "IS_ACTIVE = TRUE"

Blocked patterns: CES validation rejects these to prevent SQL injection:

filter: "col = 'value$$other'"     # $$ is the Databricks YAML delimiter
filter: "col = 'val'; DROP TABLE"  # semicolons blocked
filter: "CREATE TABLE ..."         # DDL keywords blocked

caution

The filter field is a plain string embedded directly in the metric view DDL. CES blocks $$, semicolons, and DDL keywords (CREATE, DROP, ALTER, INSERT, UPDATE, DELETE) to prevent injection. If validation rejects your filter, rewrite it as a simple boolean predicate.

Differences from Snowflake format

When CES converts a Snowflake semantic model YAML to Databricks Metric View format, the following transformations apply automatically:

Snowflake field	Databricks equivalent	Notes
`name` (top-level)	n/a	Not used in Metric View; view name set at deploy time
`tables[].description`	`comment`	Field rename
`tables[].base_table.{database,schema,table}`	`source`	Collapsed to `catalog.schema.table`
`metrics[]`	`measures[]`	Array rename; bare column refs wrapped in `SUM()`
`dimensions[].description`	`dimensions[].comment`	Field rename
`relationships[]`	`joins[]`	Array rename; `relationship` type → `on`/`using` syntax
`dimensions[].data_type`	-	Dropped, not supported
`dimensions[].sample_values`	-	Dropped, not supported
`module_custom_instructions`	-	Dropped, not supported
Boolean `= 1` comparisons	`= TRUE`	Autofixed by validator
Quoted identifiers (backticks, `"`)	Unquoted	Stripped by validator
Nested aggregates	Flattened	`AVG(SUM(x))` becomes `SUM(x)`

Naming conventions

Element	Convention	Example
`source`	`catalog.schema.table`	`prod_catalog.sales.opportunities`
Dimension `name`	lowercase with underscores	`stage_name`
Measure `name`	lowercase with underscores	`total_arr`
Join `name`	lowercase with underscores	`accounts`
`synonyms` entries	lowercase	`"deal stage"`, `"pipeline stage"`
`comment` fields	Present tense, plain language	`"Sum of annual recurring revenue."`

Constraints summary

Rule	Applies to	Detail
Measure `expr` contains aggregate	`measures[].expr`	Must include `SUM`, `AVG`, `COUNT`, `MIN`, or `MAX`.
Filter blocks injection patterns	`filter`	No `$$`, `;`, or DDL keywords.
No nested aggregates	`measures[].expr`	Flattened automatically; verify result.
`synonyms` not empty	`dimensions[]`	Provide 2–5 terms or omit entirely.
Synonym entry length	`synonyms` entries	Each synonym truncated at 100 characters on deploy.
Boolean comparisons	`filter`, `dimensions[].expr`	Use `= TRUE` / `= FALSE`, not `= 1` / `= 0`.
No quoted identifiers	`expr` fields	Backticks and double quotes are stripped.
No cross-measure references	`measures[].expr`	Reference source columns, not other measure names.

Validation and autofix

Issue	Autofix	Manual action required
Boolean column compared with `= 1`	Yes, rewritten to `= TRUE`	-
Nested aggregate in measure `expr`	Yes, flattened to innermost aggregate	Verify the simplified expr is correct
Quoted identifiers in `expr`	Yes, quotes stripped	-
`synonyms` array is empty	Yes, field removed	-
Synonym entry exceeds 100 characters	Yes, truncated at 100 characters	Review truncated synonym for meaning
`$$` in `filter`	No	Rewrite filter without `$$`
DDL keyword in `filter`	No	Rewrite filter as a simple boolean predicate
Missing aggregation in measure `expr`	No	Add `SUM()` or the appropriate aggregate

Top-level fields​

tables[]​

base_table​

dimensions[]​

time_dimensions[]​

metrics[]​

Metric expression rules​

filters[]​

relationships[]​

module_custom_instructions​

Naming conventions​

Constraints summary​

Validation and autofix​

Top-level fields​

source structured form​

DDL wrapping​

dimensions[]​

measures[]​

Measure expression rules​

joins[]​

Filter rules​

Differences from Snowflake format​

Naming conventions​

Constraints summary​

Validation and autofix​

See also​

Top-level fields

`tables[]`

`base_table`

`dimensions[]`

`time_dimensions[]`

`metrics[]`

Metric expression rules

`filters[]`

`relationships[]`

`module_custom_instructions`

Naming conventions

Constraints summary

Validation and autofix

Top-level fields

`source` structured form

DDL wrapping

`dimensions[]`

`measures[]`

Measure expression rules

`joins[]`

Filter rules

Differences from Snowflake format

Naming conventions

Constraints summary

Validation and autofix

See also