What's anomaly detection
Anomaly detection is a table-level capability in Data Quality Studio that uses Snowflake's native ML-based anomaly detection to automatically monitor two key table-health metrics: row count (volume) and freshness (timeliness). Instead of requiring you to define static thresholds, the system learns your data's normal patterns over a training period and dynamically determines what constitutes an anomaly.
Core components
Row count monitoring
Tracks the number of rows in your table over time to detect unexpected volume changes
Freshness monitoring
Tracks the age of the most recent data to detect unexpected staleness or delays
Training period
~2 weeks of historical data collection before anomaly detection becomes active
ML-based bounds
Forecast, upper bound, and lower bound are computed automatically by Snowflake's ML model
How it works
Anomaly detection leverages Snowflake's built-in anomaly detection capabilities on Data Metric Functions (DMFs). When enabled, the system creates two rules that are monitored using ML models trained on your data's historical patterns.
-
Enable: You toggle anomaly detection on a Snowflake table from the Data Quality tab. This creates two rules (Anomaly Detection - Row Count and Anomaly Detection - Freshness) and enables anomaly detection on the corresponding DMFs in Snowflake.
-
Train: Snowflake begins collecting metric history and training ML models on your data's patterns. This training period takes approximately two weeks. During this time, the rules show a "Training" status.
-
Detect: Once the model has enough data, it begins producing predictions. Each scheduled run returns:
- Forecast: the expected value based on historical patterns
- Upper bound: the upper limit of the expected range
- Lower bound: the lower limit of the expected range
- Is anomaly: whether the actual value falls outside the expected range
-
Surface: Results flow through the standard DQ pipeline and appear in Atlan alongside your other rules. If the actual value falls outside the expected range, the rule is marked as failed and alerts fire through your configured notification channels.
Status lifecycle
Anomaly detection rules progress through a defined set of states:
| Status | Description |
|---|---|
| Training | The model is collecting data and learning patterns. Takes approximately two weeks after enabling. |
| Active | The model is producing predictions and anomaly results are flowing. |
| Error | Snowflake failed to enable or sync anomaly detection for this rule. Re-toggle to retry. |
The status transitions automatically: from Training to Active when the first anomaly detection results arrive from Snowflake, or to Error if the sync to Snowflake fails.
What makes anomaly detection different from threshold-based rules
Standard DQ rules require you to set a static threshold (for example, "row count must be greater than 1,000"). Anomaly detection removes this requirement by learning what "normal" looks like for your data and flagging deviations automatically.
| Aspect | Threshold-based rules | Anomaly detection |
|---|---|---|
| Configuration | Manual threshold (operator + value) | No threshold needed |
| Adaptability | Static; requires manual updates as data patterns change | Dynamic; the ML model continuously adapts |
| Metrics covered | Any supported rule type | Row count and freshness only |
| Platform support | Snowflake, Databricks, BigQuery | Snowflake only |
| Result details | Pass/fail compared to a fixed value | Pass/fail plus forecast, upper bound, and lower bound |
When to use anomaly detection
Anomaly detection is most useful when:
- Your data volumes or freshness patterns are irregular or seasonal, making static thresholds fragile
- You want monitoring on a table without manually tuning thresholds
- You need early warning for unexpected volume drops, spikes, or delayed data delivery
It complements rather than replaces threshold-based rules. Use both together for comprehensive coverage: anomaly detection for volume and freshness, and threshold-based rules for column-level checks like null counts, uniqueness, and validity.
Limitations
- Snowflake only: anomaly detection relies on Snowflake's native ML capabilities and isn't available for Databricks or BigQuery.
- Two metrics only: currently limited to row count and freshness. Column-level anomaly detection isn't yet supported.
- Training delay: results aren't available until the model completes its ~2-week training period.
See also
- Enable anomaly detection: Step-by-step guide to enable anomaly detection on a Snowflake table
- Rule types and failed rows validations: Reference guide for all available rule types
- Configure alerts: Set up notifications for anomaly detection failures