Compute and cost tracking

Data Quality Studio (DQS) uses platform compute resources to manage rule definitions and results, and run data quality checks on monitored tables. This document explains how DQS uses compute in Snowflake and Databricks, how to track costs for each compute type, and ways to reduce compute costs.

Types of compute

DQS uses different compute resources depending on your platform. Each compute type serves specific functions in the data quality workflow.

Snowflake
Databricks

Atlan DQS uses three types of Snowflake compute:

`Snowflake managed DMF compute`Compute type
Required

Executes data metric functions on monitored tables and processes rule results, storing them in Snowflake tables.

`Warehouse`Compute type
Required

Powers SQL queries for control-plane operations including synchronizing metadata between Atlan and Snowflake, interacting with Atlan-managed tables that store DQ rules and rule results, and fetching the latest DQ results for display in Atlan.

`Serverless task`Compute type
Required

Processes raw DMF results and runs the rule auto-reattachment flow.

Atlan DQS uses two types of Databricks compute:

`SQL warehouses`Compute type
Required

Powers SQL queries for control-plane operations including synchronizing metadata between Atlan and Databricks, interacting with Atlan-managed system tables in your workspace that store DQ rules and rule results, and fetching the latest DQ results for display in Atlan.

`Jobs compute (serverless jobs)`Compute type
Required

Runs the data quality execution notebook for each monitored table. Each monitored table corresponds to one Databricks job, and each job run executes the rules attached to that table. Execution time depends on table size, the number of rules on the table, and rule complexity.

Track compute cost

Track compute costs using platform-specific billing views and tables. Each platform provides different methods to query usage and calculate spend.

Snowflake
Databricks

Use Snowflake's ACCOUNT_USAGE views to track credits spent for running DQS. Your user needs access to the SNOWFLAKE.ACCOUNT_USAGE schema before running queries. Adjust time filters and database names as needed for your environment.

Snowflake managed DMF compute
- Overall cost:

SELECT
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY;

Cost per table:

SELECT
  CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TABLE_NAME) AS ENTITY,
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY
GROUP BY ENTITY
ORDER BY TOTAL_CREDITS DESC;

Daily trend:

SELECT
  CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TABLE_NAME) AS ENTITY,
  TO_DATE(START_TIME) AS USAGE_DATE,
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.DATA_QUALITY_MONITORING_USAGE_HISTORY
GROUP BY USAGE_DATE, ENTITY
ORDER BY USAGE_DATE DESC;

Warehouse

Replace <your_warehouse_name> with the name of your DQ warehouse (for example, COMPUTE_WH).
- Overall cost:

SELECT
  NAME,
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY
WHERE SERVICE_TYPE = 'WAREHOUSE_METERING'
  AND NAME = '<your_warehouse_name>'
GROUP BY NAME;

Daily trend:

SELECT
  NAME,
  TO_DATE(START_TIME) AS USAGE_DATE,
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.METERING_HISTORY
WHERE SERVICE_TYPE = 'WAREHOUSE_METERING'
  AND NAME = '<your_warehouse_name>'
GROUP BY NAME, USAGE_DATE
ORDER BY USAGE_DATE DESC;

Serverless task

Replace <your_dq_database> with the name of your DQ database (for example, ATLAN_DQ_DQ_DEV).
- Overall cost:

SELECT
  CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TASK_NAME) AS TASK,
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
WHERE DATABASE_NAME LIKE '<your_dq_database>'
GROUP BY TASK;

Daily trend:

SELECT
  CONCAT(DATABASE_NAME, '.', SCHEMA_NAME, '.', TASK_NAME) AS TASK,
  TO_DATE(START_TIME) AS USAGE_DATE,
  SUM(CREDITS_USED) AS TOTAL_CREDITS
FROM SNOWFLAKE.ACCOUNT_USAGE.SERVERLESS_TASK_HISTORY
WHERE DATABASE_NAME LIKE '<your_dq_database>'
GROUP BY USAGE_DATE, TASK
ORDER BY USAGE_DATE DESC;

Use the system.billing.usage table to track DBU spent for running DQS. Your user or service principal needs SELECT permission to query system.billing.usage and system.billing.list_prices. Adjust time filters as needed for your environment.

Jobs compute

Track DBU consumption and cost for all DQ jobs:
- Overall cost:

SELECT
  COUNT(*) AS no_of_dq_runs,
  SUM(u.usage_quantity) AS total_dbus,
  SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
  ON u.cloud = p.cloud
 AND u.sku_name = p.sku_name
 AND u.usage_start_time >= p.price_start_time
 AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.job_name LIKE 'DQ_RUN_%'
  AND u.billing_origin_product = 'JOBS';

Cost per monitored table:

SELECT
  REGEXP_EXTRACT(u.usage_metadata.job_name, '([^/]+/[^/]+/[^/]+$)', 1) AS table_name,
  COUNT(DISTINCT u.usage_metadata.job_run_id) AS no_of_dq_runs,
  SUM(u.usage_quantity) AS total_dbus,
  SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
  ON u.cloud = p.cloud
 AND u.sku_name = p.sku_name
 AND u.usage_start_time >= p.price_start_time
 AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.job_name LIKE 'DQ_RUN_%'
  AND u.billing_origin_product = 'JOBS'
GROUP BY u.usage_metadata.job_name
ORDER BY total_cost_usd DESC;

Daily trend:

SELECT
  DATE_TRUNC('day', u.usage_start_time) AS day_start,
  COUNT(DISTINCT u.usage_metadata.job_run_id) AS no_of_dq_runs,
  SUM(u.usage_quantity) AS total_dbus,
  SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
  ON u.cloud = p.cloud
 AND u.sku_name = p.sku_name
 AND u.usage_start_time >= p.price_start_time
 AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.job_name LIKE 'DQ_RUN_%'
  AND u.billing_origin_product = 'JOBS'
GROUP BY DATE_TRUNC('day', u.usage_start_time)
ORDER BY day_start;

SQL warehouses

Replace <warehouse_id> with the ID of the SQL warehouse you have chosen for DQS.
- Overall cost:

SELECT
  SUM(u.usage_quantity) AS total_dbus,
  SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
  ON u.cloud = p.cloud
 AND u.sku_name = p.sku_name
 AND u.usage_start_time >= p.price_start_time
 AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.warehouse_id = '<warehouse_id>'
  AND u.billing_origin_product = 'SQL';

Daily trend:

SELECT
  DATE_TRUNC('day', u.usage_start_time) AS day_start,
  SUM(u.usage_quantity) AS total_dbus,
  SUM(u.usage_quantity * p.pricing.default) AS total_cost_usd
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
  ON u.cloud = p.cloud
 AND u.sku_name = p.sku_name
 AND u.usage_start_time >= p.price_start_time
 AND (p.price_end_time IS NULL OR u.usage_end_time <= p.price_end_time)
WHERE u.usage_metadata.warehouse_id = '<warehouse_id>'
  AND u.billing_origin_product = 'SQL'
GROUP BY DATE_TRUNC('day', u.usage_start_time)
ORDER BY day_start;

Reduce compute cost

Follow these practices to keep DQS compute costs efficient and predictable.

Snowflake
Databricks

Start small and scale gradually
- Start with an X-Small warehouse for DQS control-plane operations.
- Scale up only if you see consistent performance bottlenecks.
Configure auto suspend policies on warehouses
- Set aggressive auto suspend timeouts to avoid paying for idle warehouse time.
Schedule DQ runs thoughtfully
- Run critical tables hourly or daily based on business SLAs.
- Run non-critical tables weekly or after upstream ETL jobs finish.
Use incremental DQ monitoring to reduce scanned volume.
Delete low-value rules that consistently pass or are no longer critical.

Regularly reviewing warehouse metering and usage helps you keep DQS costs transparent and under control as your coverage grows.

Types of compute​

Snowflake managed DMF computeCompute typeRequired

WarehouseCompute typeRequired

Serverless taskCompute typeRequired

SQL warehousesCompute typeRequired

Jobs compute (serverless jobs)Compute typeRequired

Track compute cost​

Reduce compute cost​

See also​