Compute and cost tracking
Data Quality Studio (DQS) uses platform compute resources to manage rule definitions and results, and run data quality checks on monitored tables. This document explains how DQS uses compute in Snowflake and Databricks, how to track costs for each compute type, and ways to reduce compute costs.
Types of compute
DQS uses different compute resources depending on your platform. Each compute type serves specific functions in the data quality workflow.
- Snowflake
- Databricks
Atlan DQS uses three types of Snowflake compute:
Snowflake managed DMF computeCompute typeRequired
Snowflake managed DMF computeCompute typeExecutes data metric functions on monitored tables and processes rule results, storing them in Snowflake tables.
WarehouseCompute typeRequired
WarehouseCompute typePowers SQL queries for control-plane operations including synchronizing metadata between Atlan and Snowflake, interacting with Atlan-managed tables that store DQ rules and rule results, and fetching the latest DQ results for display in Atlan.
Serverless taskCompute typeRequired
Serverless taskCompute typeProcesses raw DMF results and runs the rule auto-reattachment flow.
Atlan DQS uses two types of Databricks compute:
SQL warehousesCompute typeRequired
SQL warehousesCompute typePowers SQL queries for control-plane operations including synchronizing metadata between Atlan and Databricks, interacting with Atlan-managed system tables in your workspace that store DQ rules and rule results, and fetching the latest DQ results for display in Atlan.
Jobs compute (serverless jobs)Compute typeRequired
Jobs compute (serverless jobs)Compute typeRuns the data quality execution notebook for each monitored table. Each monitored table corresponds to one Databricks job, and each job run executes the rules attached to that table. Execution time depends on table size, the number of rules on the table, and rule complexity.
Track compute cost
Track compute costs using platform-specific billing views and tables. Each platform provides different methods to query usage and calculate spend.
- Snowflake
- Databricks
Use Snowflake's ACCOUNT_USAGE views to track credits spent for running DQS. Your user needs access to the SNOWFLAKE.ACCOUNT_USAGE schema before running queries. Adjust time filters and database names as needed for your environment.
-
Snowflake managed DMF compute
- Overall cost:
SELECT
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY;
- Cost per table:
SELECT
CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TABLE_NAME) AS ENTITY,
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY
GROUP BY ENTITY
ORDER BY TOTAL_CREDITS DESC;
- Daily trend:
SELECT
CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TABLE_NAME) AS ENTITY,
TO_DATE(START_TIME) AS USAGE_DATE,
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY
GROUP BY USAGE_DATE, ENTITY
ORDER BY USAGE_DATE DESC;
-
Warehouse
Replace
<your_warehouse_name>with the name of your DQ warehouse (for example,COMPUTE_WH).- Overall cost:
SELECT
NAME,
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY
WHERE SERVICE_TYPE = 'WAREHOUSE_METERING'
AND NAME = '<your_warehouse_name>'
GROUP BY NAME;
- Daily trend:
SELECT
NAME,
TO_DATE(START_TIME) AS USAGE_DATE,
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY
WHERE SERVICE_TYPE = 'WAREHOUSE_METERING'
AND NAME = '<your_warehouse_name>'
GROUP BY NAME, USAGE_DATE
ORDER BY USAGE_DATE DESC;
-
Serverless task
Replace
<your_dq_database>with the name of your DQ database (for example,ATLAN_DQ_DQ_DEV).- Overall cost:
SELECT
CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TASK_NAME) AS TASK,
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
WHERE DATABASE_NAME LIKE '<your_dq_database>'
GROUP BY TASK;
- Daily trend:
SELECT
CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TASK_NAME) AS TASK,
TO_DATE(START_TIME) AS USAGE_DATE,
SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
WHERE DATABASE_NAME LIKE '<your_dq_database>'
GROUP BY USAGE_DATE, TASK
ORDER BY USAGE_DATE DESC;
Use the system.billing.usage table to track DBU spent for running DQS. Your user or service principal needs SELECT permission to query system.billing.usage and system.billing.list_prices. Adjust time filters as needed for your environment.
-
Jobs compute
Track DBU consumption and cost for all DQ jobs:
- Overall cost:
SELECT
COUNT(*) AS no_of_dq_runs,
SUM(u.usage_quantity) AS total_dbus,
SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
ON u.cloud = p.cloud
AND u.sku_name = p.sku_name
AND u.usage_start_time >= p.price_start_time
AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.job_name LIKE 'DQ_RUN_%'
AND u.billing_origin_product = 'JOBS';
- Cost per monitored table:
SELECT
REGEXP_EXTRACT(u.usage_metadata.job_name, '([^/]+/[^/]+/[^/]+$)', 1) AS table_name,
COUNT(DISTINCT u.usage_metadata.job_run_id) AS no_of_dq_runs,
SUM(u.usage_quantity) AS total_dbus,
SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
ON u.cloud = p.cloud
AND u.sku_name = p.sku_name
AND u.usage_start_time >= p.price_start_time
AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.job_name LIKE 'DQ_RUN_%'
AND u.billing_origin_product = 'JOBS'
GROUP BY u.usage_metadata.job_name
ORDER BY total_cost_usd DESC;
- Daily trend:
SELECT
DATE_TRUNC('day', u.usage_start_time) AS day_start,
COUNT(DISTINCT u.usage_metadata.job_run_id) AS no_of_dq_runs,
SUM(u.usage_quantity) AS total_dbus,
SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
ON u.cloud = p.cloud
AND u.sku_name = p.sku_name
AND u.usage_start_time >= p.price_start_time
AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.job_name LIKE 'DQ_RUN_%'
AND u.billing_origin_product = 'JOBS'
GROUP BY DATE_TRUNC('day', u.usage_start_time)
ORDER BY day_start;
-
SQL warehouses
Replace
<warehouse_id>with the ID of the SQL warehouse you have chosen for DQS.- Overall cost:
SELECT
SUM(u.usage_quantity) AS total_dbus,
SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
ON u.cloud = p.cloud
AND u.sku_name = p.sku_name
AND u.usage_start_time >= p.price_start_time
AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.warehouse_id = '<warehouse_id>'
AND u.billing_origin_product = 'SQL';
- Daily trend:
SELECT
DATE_TRUNC('day', u.usage_start_time) AS day_start,
SUM(u.usage_quantity) AS total_dbus,
SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
ON u.cloud = p.cloud
AND u.sku_name = p.sku_name
AND u.usage_start_time >= p.price_start_time
AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.warehouse_id = '<warehouse_id>'
AND u.billing_origin_product = 'SQL'
GROUP BY DATE_TRUNC('day', u.usage_start_time)
ORDER BY day_start;
Reduce compute cost
Follow these practices to keep DQS compute costs efficient and predictable.
- Snowflake
- Databricks
- Start small and scale gradually
- Start with an X-Small warehouse for DQS control-plane operations.
- Scale up only if you see consistent performance bottlenecks.
- Configure auto suspend policies on warehouses
- Set aggressive auto suspend timeouts to avoid paying for idle warehouse time.
- Schedule DQ runs thoughtfully
- Run critical tables hourly or daily based on business SLAs.
- Run non-critical tables weekly or after upstream ETL jobs finish.
- Use incremental DQ monitoring to reduce scanned volume.
- Delete low-value rules that consistently pass or are no longer critical.
Regularly reviewing warehouse metering and usage helps you keep DQS costs transparent and under control as your coverage grows.
- Start small and scale gradually
- Start with a 2X-Small serverless SQL warehouse for DQS control-plane operations.
- Increase size only if you see consistent performance bottlenecks.
- Configure auto stop policies on SQL warehouses
- Set aggressive auto stop timeouts to avoid paying for idle warehouse time.
- Prefer serverless warehouses where available for faster startup times.
- Schedule DQ runs thoughtfully
- Run critical tables hourly or daily based on business SLAs.
- Run non-critical tables weekly or after upstream ETL jobs finish.
- Use incremental DQ monitoring to reduce scanned volume.
- Delete low-value rules that consistently pass or are no longer critical.
Regularly reviewing jobs compute and SQL warehouse usage helps you keep DQS costs transparent and under control as your coverage grows.
See also
- Operations: Learn about data quality operations in Snowflake
- Data quality permissions: Understand permission scopes for data quality
- Set up Snowflake: Configure Snowflake for data quality monitoring
- Set up Databricks: Configure Databricks for data quality monitoring