Skip to main content

Databricks Private Preview

This page summarizes the Databricks-specific behavior of Context Engineering Studio: what you need before you start, how CES authenticates, the deploy-first requirement, the artifacts CES creates on Databricks, and the Genie-specific runtime constraints that shape how you scope and simulate.

Prerequisites

Before you create a Databricks context repository, make sure these are in place. These align with Databricks's Genie setup requirements and the Metric View requirements.

  • CES is enabled on your tenant and your team has the CES persona. See Setup.
  • A Databricks connection exists in Atlan, using Service Principal + OAuth M2M credentials. PAT is supported as fallback but not recommended for production, PATs are tied to human users, get rate-limited under load, and are rejected by most enterprise security teams.
  • Partner-powered AI is enabled at both the account and workspace levels. Without it, Genie is unavailable even to users who have the SQL entitlement. An account administrator enables this.
  • Unity Catalog is the source of truth for the data CES operates on. Metric Views and Genie Spaces require UC-registered assets.
  • Permissions are granted on the Atlan service principal. See Grant Databricks permissions. At minimum:
    • Databricks SQL workspace entitlement and workspace-member status.
    • USE CATALOG on the target catalog and USE SCHEMA on the target schema.
    • CREATE TABLE on the target schema (in Unity Catalog, CREATE TABLE is the privilege for creating views and metric views, there is no separate CREATE VIEW).
    • SELECT on the underlying source tables and views.
    • MODIFY (or ownership) on source tables, so CES can push Atlan descriptions into Unity Catalog COMMENTs.
    • CAN USE on the pro or serverless SQL warehouse Genie uses.
  • The SQL warehouse is pro or serverless and runs Databricks Runtime 17.3 or later. Required to create and query Metric Views.

Unlike Snowflake, Databricks connections don't have an automated preflight check yet. Verify each grant manually, see Verify permissions.

Authentication

CES authenticates to Databricks using Service Principal + OAuth M2M, the enterprise standard for production integrations. The service principal is used for:

  • Catalog crawling (Unity Catalog schemas, lineage, query history).
  • Pushing business descriptions into Unity Catalog COMMENT on tables and columns.
  • CREATE OR REPLACE VIEW ... WITH METRICS LANGUAGE YAML AS $$...$$ at deploy time.
  • Creating and configuring Genie Spaces via POST /api/2.0/genie/spaces.
  • Invoking the Genie conversation API during Chat & build and Simulate.

Set up a service principal

  1. Create a service principal in the Databricks account console (or via SCIM API).
  2. Generate an OAuth secret (client ID + client secret).
  3. Add the service principal to the target workspaces.
  4. Grant the permissions listed earlier.
  5. Configure the Atlan Databricks connection with the workspace URL plus the OAuth M2M credentials.

Workspace URL format: https://<workspace-instance>.cloud.databricks.com/, find this from your Databricks workspace URL.

Deploy-first requirement

Unlike Snowflake, Databricks requires a deployment before Chat & build and Simulate work. The Metric Views and Genie Space are what Chat & build and Simulate use; they can't run on an in-memory YAML.

The flow is:

  1. Build, describe the domain, select assets, refine the column selection, save the repository as a draft.
  2. First deploy, CES creates the Metric Views and the Genie Space. This is both a setup step and a release step.
  3. Iterate, Chat & build, Simulate, and YAML refinement all use the live Genie Space.
  4. Re-deploy, subsequent deploys run CREATE OR REPLACE VIEW on each Metric View and update the Genie Space in place. Iteration is cheap.

Plan for this in your workflow: the first deploy is part of the build cycle, not the release cycle.

Deployed artifacts

When you deploy, CES runs this sequence in order:

1. Push Unity Catalog table and column comments

CES writes Atlan business descriptions into Unity Catalog COMMENT ON TABLE and COMMENT ON COLUMN statements for every table in scope. Genie reads these at query time as runtime context, this is one of the highest-ROI accuracy inputs.

Descriptions are updated, not replaced. Existing non-Atlan comments on the same objects are preserved where possible.

2. Metric Views, one per table, one DDL per table

CES creates an AI/BI Metric View for each table in the repository. Each view is a separate CREATE OR REPLACE VIEW ... WITH METRICS LANGUAGE YAML AS $$...$$ statement, CES executes them one at a time rather than as a single multi-statement batch, to make error diagnosis cleaner. The YAML format follows Databricks's Metric View schema.

CREATE OR REPLACE VIEW <catalog>.<schema>.<view_prefix>_<table_alias>
WITH METRICS LANGUAGE YAML
AS $$
version: 1.1
comment: <table_description>
source: <catalog>.<schema>.<table_name>
filter: <optional_default_where_predicate>
dimensions:
- name: <dimension_name>
expr: <column_name>
comment: <dimension_description>
synonyms: [<synonym_1>, <synonym_2>]
measures:
- name: <measure_name>
expr: SUM(<column_name>)
comment: <measure_description>
$$;

Metric View YAML uses comment (not description) on measures and dimensions, supports synonyms, and doesn't support data_type or sample_values.

What lives where:

ArtifactLives inPrimarily contains
Metric View (per table)Unity Catalog view DDLmeasures, dimensions, and an optional default filter scoped to that table
Genie SpaceDatabricks workspaceCross-table relationships, text instructions, example SQL queries, verified answers, sample questions

Both surfaces can carry filter-like context: table-level defaults sit in the Metric View YAML filter field; cross-cutting rules that apply across tables are expressed in the Genie Space instructions. CES compiles the repository YAML to both on deploy.

3. Genie Space configuration

CES creates and configures a Genie Space on your workspace, populated with:

  • Metric Views from step 2, added as the space's data sources.
  • Join relationships between tables, with cardinality annotations (MANY_TO_ONE, ONE_TO_MANY, ONE_TO_ONE, MANY_TO_MANY).
  • Default filters from the semantic model (for example, "exclude internal test accounts").
  • Instructions, text guidance that can't be expressed as SQL (for example, "round percentages to two decimals").
  • Example SQL queries promoted from your question set, so Genie can learn the shape of correct answers.
  • Verified answers from simulations that passed, used as few-shot examples at query time.

Defaults you can override:

  • Name: Context Studio - <catalog>.<schema> by default, overridable at deploy time.
  • Location: /Workspace/Shared by default.
  • Scope: a single repository maps to one Genie Space per tenant. To deploy the same repository to multiple Spaces, save it under different names.

The Genie Space creator's compute credentials are embedded into the space and used to process all users' queries. End users don't need direct warehouse permissions.

Re-deploying repositories

Re-deploys are idempotent and update the live artifacts in place:

  • Metric Views are updated via CREATE OR REPLACE VIEW ... WITH METRICS LANGUAGE YAML.
  • The companion Genie Space is patched via Databricks's Genie API (PATCH /api/2.0/genie/spaces/{id}), so data sources, instructions, verified answers, and example queries update without recreating the space.

See the DDL reference for the full compiled output.

Simulate on Databricks

Simulations on Databricks use a live Genie Space. A few behaviors shape expectations:

Sequential runs, not parallel

Genie has workspace-level throughput limits (see below). CES runs Databricks simulations sequentially with built-in pacing, not in parallel. Expect longer wall-clock time than a Snowflake simulation on the same question set.

Two throughput budgets apply

Databricks publishes two distinct throughput figures that both apply to simulations:

  • Genie UI: up to 20 questions per minute per workspace, shared across all Genie Spaces and all clients in that workspace, per Databricks's Genie setup docs.
  • Genie API (default quota): 5 queries per minute per workspace. Large test sets take longer on Databricks than on Snowflake.

Your Databricks account team can raise both quotas on request.

Where judging runs

On Databricks, the simulation judge runs through Atlan AI. Atlan AI receives the natural-language question plus both SQL statements (generated and verified) for semantic-equivalence judging; Atlan AI's security model operates on metadata and synthetic examples without direct access to warehouse data.

Polling is lightweight

CES polls for completion every 1–5 seconds with exponential backoff and times out after 10 minutes per question.

Genie constraints

Scope a repository with Databricks's Genie best practices in mind.

Hard limits (from Databricks)

ConstraintLimit
Tables/views per Genie Space30
Workspace throughput20 questions per minute across all Genie spaces
Conversations per space10,000
Messages per conversation10,000

Accuracy guidance

  • Aim for five or fewer tables per space. Databricks's own best-practice guidance: "Stay focused." A tight selection beats a wide one. 30 is the hard cap; if you need more, pre-join related tables into views or Metric Views before adding to the space.
  • Limit columns. Hide non-essential columns; unnecessary columns consume context and reduce quality.
  • Well-annotated tables matter most. Clear Unity Catalog column names and comments are the top accuracy driver, this is why CES pushes Atlan descriptions into Unity Catalog COMMENTs on deploy.
  • Use SQL expressions for reusable business terms. Define revenue, active_customers, etc. as SQL measures in the Metric View.
  • Use example SQL queries for complex or ambiguous prompts. Databricks recommends example queries "to teach Genie how to handle common ambiguous prompts."
  • Use text instructions sparingly. Databricks guidance: "only when SQL expressions and examples cannot address the need."

Non-deterministic output

The same prompt can produce different outputs across runs. Providing example SQL (promoted from your question set) is the most reliable way to stabilize behavior.

Iterate post-deploy

CES's Observe tab, which promotes live production traces into the question set and suggests fixes from real user failures, is currently available for Snowflake Cortex deployments.

On Databricks, close the feedback loop by:

  • Reviewing Genie's own conversation history directly in the Databricks workspace.
  • Adding representative failures back into your CES question set manually and re-running Simulate.

End-user access

After deploy, end users need, per Databricks's Genie access requirements:

  • Consumer access or the Databricks SQL workspace entitlement.
  • CAN VIEW or CAN RUN on the Genie Space (via Databricks workspace UI → Settings → Permissions on the space). CAN RUN lets users ask questions; CAN VIEW lets them see conversations only.
  • SELECT on all Unity Catalog data objects used in the space. Users only see rows they have permission to access; Genie respects UC row- and column-level security.

End users don't need direct SQL warehouse permissions. Queries run under the creator's embedded compute credentials.

Troubleshooting

For Databricks-specific errors, OAuth failures, Metric View creation errors, Genie Space configuration errors, rate-limit errors, token-limit warnings, see Troubleshooting Databricks.

Next steps