Most recent day scans
Large analytic tables often accumulate years of historical data, making complete table scans during every quality check both slow and computationally expensive. Most recent day scans address this challenge by limiting rule evaluation to only the trailing 24 hours of data, focusing monitoring on the most relevant, recently ingested data while avoiding unnecessary scans of historical partitions.
Key concepts
Most recent day scans are built around three key concepts that determine how the filtering works:
Row creation timestamp column
A column in your table that stores the time when each row was inserted or updated
- Determines which rows are considered "recent"
- Shared at the table level across all rules
- Automatically updates when changed for any rule
Rolling 24-hour window
The filter dynamically adjusts to always include the most recent 24 hours relative to the latest data
- Automatically shifts forward as new data arrives
- Ensures you always monitor the freshest data
- No manual configuration updates required
Schedule independence
The rule's execution schedule remains unchanged while only the data slice becomes smaller
- Execution schedule stays the same
- Only the evaluated data slice changes
- Faster execution without altering triggers
How it works
When enabled, Atlan automatically adds a WHERE clause to the rule query that filters rows based on a timestamp column you specify. The filter selects rows where the timestamp is within the last 24 hours of the maximum timestamp value in the table.
The filter takes effect immediately after you save the rule with the toggle enabled. No workflow changes are required. Run history shows reduced row counts and shorter execution time once the filter is active. The rule schedule remains unchanged. The filter only limits which rows qualify during each run.
The execution follows this logic:
-
Timestamp identification: Atlan identifies the maximum timestamp value in your selected row creation timestamp column across the entire table.
-
Window calculation: The system calculates the 24-hour window ending at that maximum timestamp value.
-
Filter application: A WHERE clause is automatically injected into the rule's SQL query, filtering to include only rows within that 24-hour window. The filtering logic varies by platform:
-
Snowflake:
timestamp_col > (SELECT DATEADD(hour, -24, MAX(timestamp_col)) FROM table_name) -
Databricks:
timestamp_col > ((SELECT MAX(timestamp_col) FROM table_name) - INTERVAL 24 HOURS)
-
-
Rule execution: The rule runs on this filtered dataset, maintaining the same schedule you configured. Only the data slice becomes smaller. For example, if your table's most recent row was inserted at 5 PM Friday, when the rule runs next (whether that's Saturday, Monday, or any other day), it scans only rows from the previous 24 hours: 5 PM Thursday through 5 PM Friday. All rows older than Thursday at 5 PM are excluded from the scan.
What you get from most recent day scans
Most recent day scans deliver immediate performance and cost benefits by eliminating redundant historical data processing:
- Reduced query execution time: Process only new data instead of the full table, dramatically shortening rule execution times
- Lower compute costs: Minimize warehouse resource consumption on Snowflake and Databricks by avoiding scans of stale partitions
- Focused alerting: Monitor fresh data that's most likely to contain issues in batch-based ingestion workflows, reducing alert noise from historical data
See also
- Run on the most recent day of data: Configuration reference for enabling this optimization
- Rule types and failed rows validations: Understanding how different rule types evaluate data