What's anomaly detection

Connect docs via MCP

Anomaly detection is a table-level capability in Data Quality Studio that uses Snowflake's native ML-based anomaly detection to automatically monitor two key table-health metrics: row count (volume) and freshness (timeliness). Instead of requiring you to define static thresholds, the system learns your data's normal patterns over a training period and dynamically determines what constitutes an anomaly.

Core components

Row count monitoring

Tracks the number of rows in your table over time to detect unexpected volume changes

Freshness monitoring

Tracks the age of the most recent data to detect unexpected staleness or delays

Training period

~2 weeks of historical data collection before anomaly detection becomes active

ML-based bounds

Forecast, upper bound, and lower bound are computed automatically by Snowflake's ML model

How it works

Anomaly detection leverages Snowflake's built-in anomaly detection capabilities on Data Metric Functions (DMFs). When enabled, the system creates two rules that are monitored using ML models trained on your data's historical patterns.

Enable: You toggle anomaly detection on a Snowflake table from the Data Quality tab. This creates two rules (Anomaly Detection - Row Count and Anomaly Detection - Freshness) and enables anomaly detection on the corresponding DMFs in Snowflake.
Train: Snowflake begins collecting metric history and training ML models on your data's patterns. This training period takes approximately two weeks. During this time, the rules show a "Training" status.
Detect: Once the model has enough data, it begins producing predictions. Each scheduled run returns:
- Forecast: the expected value based on historical patterns
- Upper bound: the upper limit of the expected range
- Lower bound: the lower limit of the expected range
- Is anomaly: whether the actual value falls outside the expected range
Surface: Results flow through the standard DQ pipeline and appear in Atlan alongside your other rules. If the actual value falls outside the expected range, the rule is marked as failed and alerts fire through your configured notification channels.

Status lifecycle

Anomaly detection rules progress through a defined set of states:

Status	Description
Training	The model is collecting data and learning patterns. Takes approximately two weeks after enabling.
Active	The model is producing predictions and anomaly results are flowing.
Error	Snowflake failed to enable or sync anomaly detection for this rule. Re-toggle to retry.

The status transitions automatically: from Training to Active when the first anomaly detection results arrive from Snowflake, or to Error if the sync to Snowflake fails.

What makes anomaly detection different from threshold-based rules

Standard DQ rules require you to set a static threshold (for example, "row count must be greater than 1,000"). Anomaly detection removes this requirement by learning what "normal" looks like for your data and flagging deviations automatically.

Aspect	Threshold-based rules	Anomaly detection
Configuration	Manual threshold (operator + value)	No threshold needed
Adaptability	Static; requires manual updates as data patterns change	Dynamic; the ML model continuously adapts
Metrics covered	Any supported rule type	Row count and freshness only
Platform support	Snowflake, Databricks, BigQuery	Snowflake only
Result details	Pass/fail compared to a fixed value	Pass/fail plus forecast, upper bound, and lower bound

When to use anomaly detection

Anomaly detection is most useful when:

Your data volumes or freshness patterns are irregular or seasonal, making static thresholds fragile
You want monitoring on a table without manually tuning thresholds
You need early warning for unexpected volume drops, spikes, or delayed data delivery

It complements rather than replaces threshold-based rules. Use both together for comprehensive coverage: anomaly detection for volume and freshness, and threshold-based rules for column-level checks like null counts, uniqueness, and validity.

Limitations

Snowflake only: anomaly detection relies on Snowflake's native ML capabilities and isn't available for Databricks or BigQuery.
Two metrics only: currently limited to row count and freshness. Column-level anomaly detection isn't yet supported.
Training delay: results aren't available until the model completes its ~2-week training period.

Core components​

Row count monitoring

Freshness monitoring

Training period

ML-based bounds

How it works​

Status lifecycle​

What makes anomaly detection different from threshold-based rules​

When to use anomaly detection​

Limitations​

See also​