Context repository YAML schema Private Preview
Context Engineering Studio generates different YAML formats depending on your target engine. Select your engine below.
- Snowflake Cortex Analyst
- Databricks Metric View
A complete Snowflake Cortex Analyst semantic model with all major sections. Inline comments explain each field.
name: SALES_PIPELINE # repo identifier, UPPER_SNAKE_CASE
tables:
- name: OPPORTUNITIES # logical alias used in exprs
description: "Active and closed sales opportunities tracked in Salesforce."
base_table:
database: PROD_DB
schema: SALES
table: OPPORTUNITIES
dimensions:
- name: STAGE_NAME
expr: STAGE_NAME
data_type: VARCHAR
description: "Current stage of the opportunity in the sales cycle."
sample_values: ["Prospecting", "Negotiation", "Closed Won"]
synonyms: ["stage", "pipeline stage", "deal stage"]
time_dimensions:
- name: CLOSE_DATE
expr: CLOSE_DATE
data_type: DATE
description: "Date the opportunity is expected to close or was closed."
synonyms: ["close date", "expected close", "deal close"]
metrics:
- name: TOTAL_ARR
expr: "SUM(ANNUAL_RECURRING_REVENUE)"
description: "Sum of annual recurring revenue across all opportunities."
access_modifier: public_access
filters:
- name: OPEN_OPPORTUNITIES
synonyms: ["active", "open deals", "in-flight"]
description: "Filters to opportunities that have not yet been closed."
expr: "STAGE_NAME NOT IN ('Closed Won', 'Closed Lost')"
module_custom_instructions:
sql_generation: "ALWAYS filter by CLOSE_DATE when questions mention a time period."
question_categorization: "Questions about pipeline health refer to open opportunities only."
Per Snowflake's Cortex Analyst best practices, high-quality descriptions aren't optional, they're required. The YAML schema marks dimension and column descriptions as optional for syntactic reasons, but deploying without them measurably hurts answer quality. Write descriptions that explain proprietary terms and abbreviations explicitly; don't assume shared knowledge.
Top-level fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Identifier for the semantic model. Use UPPER_SNAKE_CASE. Must be unique within the deployment target. |
tables | array | Yes | One or more table definitions. See Tables. Start with 5–10 tables for a POC, scale to more focused semantic views rather than one wide view. |
module_custom_instructions | object | No | Free-text SQL generation hints passed to Cortex Analyst at query time. See Custom instructions. |
tables[]
Each entry in tables defines one logical table, the mapping between a physical Snowflake table and the business concepts exposed from it.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical alias for this table. Referenced by expr fields in dimensions and metrics. UPPER_SNAKE_CASE. |
description | string | Yes | Business-friendly description of what this table represents. Present tense. 200 characters maximum. |
base_table | object | Yes | Fully qualified physical table. See base_table. |
dimensions | array | No | Categorical or text attributes. See dimensions[]. |
time_dimensions | array | No | Date or timestamp columns for time-series queries. See time_dimensions[]. |
metrics | array | No | Aggregate measures. See metrics[]. |
filters | array | No | Named reusable WHERE predicates. See filters[]. |
base_table
| Field | Type | Required | Description |
|---|---|---|---|
database | string | Yes | Snowflake database name. |
schema | string | Yes | Schema within the database. |
table | string | Yes | Physical table or view name. |
dimensions[]
Dimensions are categorical, text, or numeric attributes used to group and filter results.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name. UPPER_SNAKE_CASE. Must be unique within the table. |
expr | string | Yes | Column reference in the physical table. Use the exact column name as it appears in Snowflake. |
data_type | string | No | Snowflake data type. Common values: VARCHAR, NUMBER, BOOLEAN, DATE, TIMESTAMP_TZ. |
description | string | No | Business-friendly description. Present tense. 200 characters maximum. |
sample_values | array | No | Representative string values. Three or more recommended. Helps Cortex Analyst understand the domain of this column. |
synonyms | array | No | Lowercase alternative names users might say. Use sparingly, see note below. Don't leave this as an empty array, omit the field entirely if you have no synonyms. |
sample_values are especially useful for low-cardinality columns like status fields, region codes, and product categories. They anchor Cortex Analyst's filter generation to real data values.
Per Snowflake's Cortex Analyst best practices, avoid synonyms unless the term is unique or industry-specific. Generic synonyms consume tokens without meaningful accuracy improvement. Prefer a sharper description to disambiguate terms when possible.
time_dimensions[]
Time dimensions are date or timestamp columns intended for time-series filtering and aggregation.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name. UPPER_SNAKE_CASE. Must be unique within the table. |
expr | string | Yes | Column reference. Use the exact column name. |
data_type | string | Yes | Must be DATE or TIMESTAMP_TZ(9). |
description | string | No | Business-friendly description. Present tense. 200 characters maximum. |
synonyms | array | No | Lowercase alternative names. Two to five terms. Omit rather than leaving empty. |
Cortex Analyst uses the data_type of time dimensions to decide how to apply date truncation and range filters. Use DATE for calendar date columns. Use TIMESTAMP_TZ(9) for event timestamps with timezone.
metrics[]
Metrics are aggregate expressions, the computed measures your business cares about.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name. UPPER_SNAKE_CASE. Must be unique within the table. |
expr | string | Yes | Aggregation expression. Must start with an aggregation function: SUM, AVG, COUNT, MIN, or MAX. |
description | string | No | Business-friendly description of what this metric measures. Present tense. 200 characters maximum. |
access_modifier | string | No | Controls query visibility. Valid values: public_access (end users can query), private_access (internal only, excluded from Cortex Analyst results). Defaults to private_access if omitted. |
Metric expression rules
exprmust start with an aggregation function. Bare column references aren't valid metrics.- Nested aggregations (for example,
AVG(SUM(col))) aren't supported, use a single aggregation. - For
COUNT(DISTINCT col)patterns, test carefully, Cortex Analyst support varies by account.
Valid expressions:
expr: "SUM(ANNUAL_RECURRING_REVENUE)"
expr: "COUNT(DISTINCT OPPORTUNITY_ID)"
expr: "AVG(DEAL_SIZE)"
Invalid expressions:
expr: "ANNUAL_RECURRING_REVENUE" # missing aggregation
expr: "AVG(SUM(DEAL_VALUE))" # nested aggregate
filters[]
Filters are named, reusable WHERE predicates that let users ask filtered questions naturally.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name. UPPER_SNAKE_CASE. Must be unique within the table. |
synonyms | array | No | Lowercase alternative terms. Two to five terms. Omit rather than leaving empty. |
description | string | No | What this filter does, in plain language. Present tense. 200 characters maximum. |
expr | string | Yes | A boolean SQL WHERE predicate. Must evaluate to TRUE/FALSE. Never use an aggregation function here. |
Valid expressions:
expr: "STAGE_NAME NOT IN ('Closed Won', 'Closed Lost')"
expr: "REGION = 'EMEA'"
expr: "IS_ACTIVE = TRUE"
expr: "CLOSE_DATE >= DATEADD(year, -1, CURRENT_DATE())"
Invalid expressions:
expr: "SUM(REVENUE) > 1000000" # aggregation in a filter, use a metric instead
expr: "COUNT(ID)" # aggregation, not a predicate
relationships[]
Relationships define how tables in the semantic model join to each other. Cortex Analyst uses these when a query spans multiple tables.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name for this relationship. UPPER_SNAKE_CASE. |
left_table | string | Yes | Logical alias of the left-hand table. Must match a name in tables[]. |
right_table | string | Yes | Logical alias of the right-hand table. Must match a name in tables[]. |
join_type | string | No | LEFT, INNER, or FULL. Defaults to LEFT if omitted. |
relationship_columns | array | Yes | One or more column pairs defining the join condition. Each pair has left_column and right_column. |
relationships:
- name: OPPORTUNITIES_TO_ACCOUNTS
left_table: OPPORTUNITIES
right_table: ACCOUNTS
join_type: LEFT
relationship_columns:
- left_column: ACCOUNT_ID
right_column: ACCOUNT_ID
CES generates relationships entries automatically from lineage and query history when you build a multi-table context repository. Review Context Agents Studio's join recommendations in the Thread tab before deploying.
Cortex Analyst semantic views don't directly support many-to-many relationships. If your domain has one, model it with a shared dimension (bridge) table and represent the relationship as two MANY_TO_ONE joins through the bridge. See Snowflake's best practices.
module_custom_instructions
Optional free-text instructions that Cortex Analyst receives alongside the semantic model when generating SQL.
| Field | Type | Description |
|---|---|---|
sql_generation | string | Instructions for how Cortex generates SQL, date filter rules, preferred join paths, columns to avoid. |
question_categorization | string | Instructions for how Cortex interprets question intent, disambiguating terms that map to multiple tables. |
Keep custom instructions focused. Long or contradictory instructions reduce accuracy. One rule per instruction, tested with your eval suite.
Naming conventions
| Element | Convention | Example |
|---|---|---|
Model name | UPPER_SNAKE_CASE | SALES_PIPELINE |
Table name (alias) | UPPER_SNAKE_CASE | OPPORTUNITIES |
Dimension / time dimension name | UPPER_SNAKE_CASE | STAGE_NAME |
Metric name | UPPER_SNAKE_CASE | TOTAL_ARR |
Filter name | UPPER_SNAKE_CASE | OPEN_OPPORTUNITIES |
synonyms entries | lowercase | "deal stage", "pipeline stage" |
| Descriptions | Present tense, plain language | "Sum of annual recurring revenue." |
Constraints summary
| Rule | Applies to | Detail |
|---|---|---|
| Name casing | All name fields | UPPER_SNAKE_CASE only. |
Metric expr starts with aggregate | metrics[].expr | First token must be SUM, AVG, COUNT, MIN, or MAX. |
Filter expr is boolean | filters[].expr | Must evaluate to TRUE/FALSE. No aggregation functions. |
synonyms not empty | dimensions, time_dimensions, filters | Provide 2–5 lowercase terms or omit the field entirely. An empty array [] causes validation errors. |
| Synonym entry length | synonyms entries | Each synonym truncated at 128 characters on deploy. |
| Description length | All description fields | 200 characters maximum. Longer values are truncated on deploy. |
| No nested aggregates | metrics[].expr | Patterns like AVG(SUM(x)) are invalid. |
Validation and autofix
| Issue | Autofix | Manual action required |
|---|---|---|
synonyms array is empty | Yes, field removed | - |
| Synonym entry exceeds 128 characters | Yes, truncated at 128 characters | Review truncated synonym for meaning |
| Descriptions exceed 200 characters | Yes, truncated at word boundary | Review truncated text |
| Lowercase metric or dimension names | Yes, converted to UPPER_SNAKE_CASE | - |
Nested aggregate in metric expr | Yes, flattened to innermost aggregate | Verify the simplified expr is correct |
Boolean column compared with = 1 | Yes, rewritten to = TRUE | - |
| Column name with backticks or quotes | Yes, quotes stripped | - |
A complete Databricks Metric View v1.1 definition with all major sections. Inline comments explain each field.
version: "1.1" # required, always "1.1"
source: prod_catalog.sales.opportunities # fully qualified: catalog.schema.table
comment: "Active and closed sales opportunities tracked in Salesforce."
filter: "IS_DELETED = FALSE" # optional global WHERE predicate
dimensions:
- name: stage_name
expr: STAGE_NAME
comment: "Current stage of the opportunity in the sales cycle."
display_name: "Stage"
synonyms: ["stage", "pipeline stage", "deal stage"]
format: "text"
measures:
- name: total_arr
expr: "SUM(ANNUAL_RECURRING_REVENUE)" # must contain an aggregate function
comment: "Sum of annual recurring revenue across all opportunities."
window: false
joins:
- name: accounts
source: prod_catalog.sales.accounts # fully qualified table
on: "opportunities.ACCOUNT_ID = accounts.ACCOUNT_ID"
Top-level fields
| Field | Type | Required | Description |
|---|---|---|---|
version | string | Yes | Always "1.1". |
source | string or object | Yes | Fully qualified primary table. Accepts shorthand catalog.schema.table or structured form with time_column. See below. |
comment | string | No | Business-friendly description of the metric view. Equivalent to Snowflake's description. |
filter | string | No | Global SQL WHERE predicate applied to all queries on this view. See Filter rules. |
dimensions | array | No | Categorical attributes. See dimensions[]. |
measures | array | No | Aggregate measures. See measures[]. |
joins | array | No | Additional tables joined to the primary source. See joins[]. |
source structured form
When you need to specify a default time column for time-series queries, use the structured form instead of the dotted string:
source:
catalog: prod_catalog
schema: sales
table: opportunities
time_column: CLOSE_DATE # default column for time-based filtering
The time_column value must be the exact column name in the source table. Genie uses it when a question mentions a time period but doesn't name a specific date column.
DDL wrapping
CES creates each Metric View with Databricks's official WITH METRICS LANGUAGE YAML syntax, documented in Databricks Metric Views:
CREATE OR REPLACE VIEW <catalog>.<schema>.<view_name>
WITH METRICS LANGUAGE YAML
AS $$
version: 1.1
source: prod_catalog.sales.opportunities
measures:
- name: total_arr
expr: "SUM(ANNUAL_RECURRING_REVENUE)"
comment: "Total annual recurring revenue."
$$;
WITH METRICS LANGUAGE YAML signals to the Databricks runtime that the body is a Metric View definition, not a standard view. The view appears in Unity Catalog and is recognized as an AI/BI Metric View when referenced by a Genie Space.
The YAML body sits inside a $$ ... $$ dollar-quoted block. A literal $$ inside the YAML terminates the delimiter early and breaks the DDL, which is why the filter field blocks $$.
dimensions[]
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name for this dimension. |
expr | string | Yes | Column reference in the source table. Use the exact column name. |
comment | string | No | Business-friendly description. Equivalent to Snowflake's description. |
display_name | string | No | Human-readable label shown in Genie Space. |
synonyms | array | No | Lowercase alternative names. Two to five terms. Omit rather than leaving empty. |
format | string | No | Display format hint. Common values: "text", "number", "date", "currency". |
Unlike Snowflake, Databricks Metric View doesn't support data_type or sample_values on dimensions. These fields are dropped automatically when CES converts a Snowflake YAML to Databricks format.
measures[]
Measures are aggregate expressions, equivalent to metrics in the Snowflake format.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Logical name for this measure. |
expr | string | Yes | Aggregation expression. Must contain an aggregation function: SUM, AVG, COUNT, MIN, or MAX. |
comment | string | No | Business-friendly description of what this measure calculates. |
window | boolean | No | Set to true if this measure uses a window function. Defaults to false. |
Measure expression rules
exprmust include an aggregation. Bare column references aren't valid measures.- Nested aggregations (for example,
AVG(SUM(col))) are invalid and are flattened automatically to the innermost aggregate during validation. - Don't reference another measure's
nameinside anexpr: use the underlying column directly.
Valid expressions:
expr: "SUM(ANNUAL_RECURRING_REVENUE)"
expr: "COUNT(DISTINCT OPPORTUNITY_ID)"
expr: "AVG(DEAL_SIZE)"
Invalid expressions:
expr: "ANNUAL_RECURRING_REVENUE" # missing aggregation
expr: "AVG(SUM(DEAL_VALUE))" # nested aggregate, flattened to SUM(DEAL_VALUE)
expr: "SUM(total_arr)" # references another measure name, not a column
joins[]
Joins connect additional tables to the primary source for multi-table queries.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Alias used to reference this joined table in expressions. |
source | string | Yes | Fully qualified table: catalog.schema.table. |
on | string | No† | SQL join condition. For example: "opportunities.ACCOUNT_ID = accounts.ACCOUNT_ID". |
using | string | No† | Column name used for an equi-join when the column name is identical in both tables. |
†Provide either on or using, not both.
Filter rules
The top-level filter field applies a global WHERE predicate to every query on the metric view.
Valid filter expressions:
filter: "IS_DELETED = FALSE"
filter: "REGION = 'EMEA'"
filter: "IS_ACTIVE = TRUE"
Blocked patterns: CES validation rejects these to prevent SQL injection:
filter: "col = 'value$$other'" # $$ is the Databricks YAML delimiter
filter: "col = 'val'; DROP TABLE" # semicolons blocked
filter: "CREATE TABLE ..." # DDL keywords blocked
The filter field is a plain string embedded directly in the metric view DDL. CES blocks $$, semicolons, and DDL keywords (CREATE, DROP, ALTER, INSERT, UPDATE, DELETE) to prevent injection. If validation rejects your filter, rewrite it as a simple boolean predicate.
Differences from Snowflake format
When CES converts a Snowflake semantic model YAML to Databricks Metric View format, the following transformations apply automatically:
| Snowflake field | Databricks equivalent | Notes |
|---|---|---|
name (top-level) | n/a | Not used in Metric View; view name set at deploy time |
tables[].description | comment | Field rename |
tables[].base_table.{database,schema,table} | source | Collapsed to catalog.schema.table |
metrics[] | measures[] | Array rename; bare column refs wrapped in SUM() |
dimensions[].description | dimensions[].comment | Field rename |
relationships[] | joins[] | Array rename; relationship type → on/using syntax |
dimensions[].data_type | - | Dropped, not supported |
dimensions[].sample_values | - | Dropped, not supported |
module_custom_instructions | - | Dropped, not supported |
Boolean = 1 comparisons | = TRUE | Autofixed by validator |
Quoted identifiers (backticks, ") | Unquoted | Stripped by validator |
| Nested aggregates | Flattened | AVG(SUM(x)) becomes SUM(x) |
Naming conventions
| Element | Convention | Example |
|---|---|---|
source | catalog.schema.table | prod_catalog.sales.opportunities |
Dimension name | lowercase with underscores | stage_name |
Measure name | lowercase with underscores | total_arr |
Join name | lowercase with underscores | accounts |
synonyms entries | lowercase | "deal stage", "pipeline stage" |
comment fields | Present tense, plain language | "Sum of annual recurring revenue." |
Constraints summary
| Rule | Applies to | Detail |
|---|---|---|
Measure expr contains aggregate | measures[].expr | Must include SUM, AVG, COUNT, MIN, or MAX. |
| Filter blocks injection patterns | filter | No $$, ;, or DDL keywords. |
| No nested aggregates | measures[].expr | Flattened automatically; verify result. |
synonyms not empty | dimensions[] | Provide 2–5 terms or omit entirely. |
| Synonym entry length | synonyms entries | Each synonym truncated at 100 characters on deploy. |
| Boolean comparisons | filter, dimensions[].expr | Use = TRUE / = FALSE, not = 1 / = 0. |
| No quoted identifiers | expr fields | Backticks and double quotes are stripped. |
| No cross-measure references | measures[].expr | Reference source columns, not other measure names. |
Validation and autofix
| Issue | Autofix | Manual action required |
|---|---|---|
Boolean column compared with = 1 | Yes, rewritten to = TRUE | - |
Nested aggregate in measure expr | Yes, flattened to innermost aggregate | Verify the simplified expr is correct |
Quoted identifiers in expr | Yes, quotes stripped | - |
synonyms array is empty | Yes, field removed | - |
| Synonym entry exceeds 100 characters | Yes, truncated at 100 characters | Review truncated synonym for meaning |
$$ in filter | No | Rewrite filter without $$ |
DDL keyword in filter | No | Rewrite filter as a simple boolean predicate |
Missing aggregation in measure expr | No | Add SUM() or the appropriate aggregate |
See also
- Build: create a repository and refine the semantic model
- Snowflake engine flow
- Databricks engine flow
- Troubleshooting