Skip to main content

Context repository YAML schema Private Preview

Context Engineering Studio generates different YAML formats depending on your target engine. Select your engine below.

A complete Snowflake Cortex Analyst semantic model with all major sections. Inline comments explain each field.

name: SALES_PIPELINE                         # repo identifier, UPPER_SNAKE_CASE

tables:
- name: OPPORTUNITIES # logical alias used in exprs
description: "Active and closed sales opportunities tracked in Salesforce."
base_table:
database: PROD_DB
schema: SALES
table: OPPORTUNITIES
dimensions:
- name: STAGE_NAME
expr: STAGE_NAME
data_type: VARCHAR
description: "Current stage of the opportunity in the sales cycle."
sample_values: ["Prospecting", "Negotiation", "Closed Won"]
synonyms: ["stage", "pipeline stage", "deal stage"]
time_dimensions:
- name: CLOSE_DATE
expr: CLOSE_DATE
data_type: DATE
description: "Date the opportunity is expected to close or was closed."
synonyms: ["close date", "expected close", "deal close"]
metrics:
- name: TOTAL_ARR
expr: "SUM(ANNUAL_RECURRING_REVENUE)"
description: "Sum of annual recurring revenue across all opportunities."
access_modifier: public_access
filters:
- name: OPEN_OPPORTUNITIES
synonyms: ["active", "open deals", "in-flight"]
description: "Filters to opportunities that have not yet been closed."
expr: "STAGE_NAME NOT IN ('Closed Won', 'Closed Lost')"

module_custom_instructions:
sql_generation: "ALWAYS filter by CLOSE_DATE when questions mention a time period."
question_categorization: "Questions about pipeline health refer to open opportunities only."

Descriptions are required for accuracy (Snowflake)

Per Snowflake's Cortex Analyst best practices, high-quality descriptions aren't optional, they're required. The YAML schema marks dimension and column descriptions as optional for syntactic reasons, but deploying without them measurably hurts answer quality. Write descriptions that explain proprietary terms and abbreviations explicitly; don't assume shared knowledge.

Top-level fields

FieldTypeRequiredDescription
namestringYesIdentifier for the semantic model. Use UPPER_SNAKE_CASE. Must be unique within the deployment target.
tablesarrayYesOne or more table definitions. See Tables. Start with 5–10 tables for a POC, scale to more focused semantic views rather than one wide view.
module_custom_instructionsobjectNoFree-text SQL generation hints passed to Cortex Analyst at query time. See Custom instructions.

tables[]

Each entry in tables defines one logical table, the mapping between a physical Snowflake table and the business concepts exposed from it.

FieldTypeRequiredDescription
namestringYesLogical alias for this table. Referenced by expr fields in dimensions and metrics. UPPER_SNAKE_CASE.
descriptionstringYesBusiness-friendly description of what this table represents. Present tense. 200 characters maximum.
base_tableobjectYesFully qualified physical table. See base_table.
dimensionsarrayNoCategorical or text attributes. See dimensions[].
time_dimensionsarrayNoDate or timestamp columns for time-series queries. See time_dimensions[].
metricsarrayNoAggregate measures. See metrics[].
filtersarrayNoNamed reusable WHERE predicates. See filters[].

base_table

FieldTypeRequiredDescription
databasestringYesSnowflake database name.
schemastringYesSchema within the database.
tablestringYesPhysical table or view name.

dimensions[]

Dimensions are categorical, text, or numeric attributes used to group and filter results.

FieldTypeRequiredDescription
namestringYesLogical name. UPPER_SNAKE_CASE. Must be unique within the table.
exprstringYesColumn reference in the physical table. Use the exact column name as it appears in Snowflake.
data_typestringNoSnowflake data type. Common values: VARCHAR, NUMBER, BOOLEAN, DATE, TIMESTAMP_TZ.
descriptionstringNoBusiness-friendly description. Present tense. 200 characters maximum.
sample_valuesarrayNoRepresentative string values. Three or more recommended. Helps Cortex Analyst understand the domain of this column.
synonymsarrayNoLowercase alternative names users might say. Use sparingly, see note below. Don't leave this as an empty array, omit the field entirely if you have no synonyms.
tip

sample_values are especially useful for low-cardinality columns like status fields, region codes, and product categories. They anchor Cortex Analyst's filter generation to real data values.

Snowflake synonym guidance

Per Snowflake's Cortex Analyst best practices, avoid synonyms unless the term is unique or industry-specific. Generic synonyms consume tokens without meaningful accuracy improvement. Prefer a sharper description to disambiguate terms when possible.


time_dimensions[]

Time dimensions are date or timestamp columns intended for time-series filtering and aggregation.

FieldTypeRequiredDescription
namestringYesLogical name. UPPER_SNAKE_CASE. Must be unique within the table.
exprstringYesColumn reference. Use the exact column name.
data_typestringYesMust be DATE or TIMESTAMP_TZ(9).
descriptionstringNoBusiness-friendly description. Present tense. 200 characters maximum.
synonymsarrayNoLowercase alternative names. Two to five terms. Omit rather than leaving empty.
note

Cortex Analyst uses the data_type of time dimensions to decide how to apply date truncation and range filters. Use DATE for calendar date columns. Use TIMESTAMP_TZ(9) for event timestamps with timezone.


metrics[]

Metrics are aggregate expressions, the computed measures your business cares about.

FieldTypeRequiredDescription
namestringYesLogical name. UPPER_SNAKE_CASE. Must be unique within the table.
exprstringYesAggregation expression. Must start with an aggregation function: SUM, AVG, COUNT, MIN, or MAX.
descriptionstringNoBusiness-friendly description of what this metric measures. Present tense. 200 characters maximum.
access_modifierstringNoControls query visibility. Valid values: public_access (end users can query), private_access (internal only, excluded from Cortex Analyst results). Defaults to private_access if omitted.

Metric expression rules

  • expr must start with an aggregation function. Bare column references aren't valid metrics.
  • Nested aggregations (for example, AVG(SUM(col))) aren't supported, use a single aggregation.
  • For COUNT(DISTINCT col) patterns, test carefully, Cortex Analyst support varies by account.

Valid expressions:

expr: "SUM(ANNUAL_RECURRING_REVENUE)"
expr: "COUNT(DISTINCT OPPORTUNITY_ID)"
expr: "AVG(DEAL_SIZE)"

Invalid expressions:

expr: "ANNUAL_RECURRING_REVENUE"          # missing aggregation
expr: "AVG(SUM(DEAL_VALUE))" # nested aggregate

filters[]

Filters are named, reusable WHERE predicates that let users ask filtered questions naturally.

FieldTypeRequiredDescription
namestringYesLogical name. UPPER_SNAKE_CASE. Must be unique within the table.
synonymsarrayNoLowercase alternative terms. Two to five terms. Omit rather than leaving empty.
descriptionstringNoWhat this filter does, in plain language. Present tense. 200 characters maximum.
exprstringYesA boolean SQL WHERE predicate. Must evaluate to TRUE/FALSE. Never use an aggregation function here.

Valid expressions:

expr: "STAGE_NAME NOT IN ('Closed Won', 'Closed Lost')"
expr: "REGION = 'EMEA'"
expr: "IS_ACTIVE = TRUE"
expr: "CLOSE_DATE >= DATEADD(year, -1, CURRENT_DATE())"

Invalid expressions:

expr: "SUM(REVENUE) > 1000000"    # aggregation in a filter, use a metric instead
expr: "COUNT(ID)" # aggregation, not a predicate

relationships[]

Relationships define how tables in the semantic model join to each other. Cortex Analyst uses these when a query spans multiple tables.

FieldTypeRequiredDescription
namestringYesLogical name for this relationship. UPPER_SNAKE_CASE.
left_tablestringYesLogical alias of the left-hand table. Must match a name in tables[].
right_tablestringYesLogical alias of the right-hand table. Must match a name in tables[].
join_typestringNoLEFT, INNER, or FULL. Defaults to LEFT if omitted.
relationship_columnsarrayYesOne or more column pairs defining the join condition. Each pair has left_column and right_column.
relationships:
- name: OPPORTUNITIES_TO_ACCOUNTS
left_table: OPPORTUNITIES
right_table: ACCOUNTS
join_type: LEFT
relationship_columns:
- left_column: ACCOUNT_ID
right_column: ACCOUNT_ID
note

CES generates relationships entries automatically from lineage and query history when you build a multi-table context repository. Review Context Agents Studio's join recommendations in the Thread tab before deploying.

Many-to-many (Snowflake)

Cortex Analyst semantic views don't directly support many-to-many relationships. If your domain has one, model it with a shared dimension (bridge) table and represent the relationship as two MANY_TO_ONE joins through the bridge. See Snowflake's best practices.


module_custom_instructions

Optional free-text instructions that Cortex Analyst receives alongside the semantic model when generating SQL.

FieldTypeDescription
sql_generationstringInstructions for how Cortex generates SQL, date filter rules, preferred join paths, columns to avoid.
question_categorizationstringInstructions for how Cortex interprets question intent, disambiguating terms that map to multiple tables.
tip

Keep custom instructions focused. Long or contradictory instructions reduce accuracy. One rule per instruction, tested with your eval suite.


Naming conventions

ElementConventionExample
Model nameUPPER_SNAKE_CASESALES_PIPELINE
Table name (alias)UPPER_SNAKE_CASEOPPORTUNITIES
Dimension / time dimension nameUPPER_SNAKE_CASESTAGE_NAME
Metric nameUPPER_SNAKE_CASETOTAL_ARR
Filter nameUPPER_SNAKE_CASEOPEN_OPPORTUNITIES
synonyms entrieslowercase"deal stage", "pipeline stage"
DescriptionsPresent tense, plain language"Sum of annual recurring revenue."

Constraints summary

RuleApplies toDetail
Name casingAll name fieldsUPPER_SNAKE_CASE only.
Metric expr starts with aggregatemetrics[].exprFirst token must be SUM, AVG, COUNT, MIN, or MAX.
Filter expr is booleanfilters[].exprMust evaluate to TRUE/FALSE. No aggregation functions.
synonyms not emptydimensions, time_dimensions, filtersProvide 2–5 lowercase terms or omit the field entirely. An empty array [] causes validation errors.
Synonym entry lengthsynonyms entriesEach synonym truncated at 128 characters on deploy.
Description lengthAll description fields200 characters maximum. Longer values are truncated on deploy.
No nested aggregatesmetrics[].exprPatterns like AVG(SUM(x)) are invalid.

Validation and autofix

IssueAutofixManual action required
synonyms array is emptyYes, field removed-
Synonym entry exceeds 128 charactersYes, truncated at 128 charactersReview truncated synonym for meaning
Descriptions exceed 200 charactersYes, truncated at word boundaryReview truncated text
Lowercase metric or dimension namesYes, converted to UPPER_SNAKE_CASE-
Nested aggregate in metric exprYes, flattened to innermost aggregateVerify the simplified expr is correct
Boolean column compared with = 1Yes, rewritten to = TRUE-
Column name with backticks or quotesYes, quotes stripped-

See also