Skip to main content

Database usage and cost analysis Private Preview

Analyze database usage, optimize query performance, and manage storage and compute costs across your data warehouse platforms using Lakehouse (MDLH).

This page provides SQL templates and guidance for cost and usage optimization across Snowflake, BigQuery, and Databricks.

Before you begin

Before you run these queries, make sure:

You can use SQL queries directly on the TABLE_ENTITY table (and related metadata tables) in the Lakehouse to analyze cost and performance metrics. Below are example queries for common cost optimization scenarios.

Identify stale and unused tables

Find tables that haven't been accessed or updated recently but still incur high storage or compute costs. Use this to identify candidates for archiving, cleanup, or cost optimization.

SELECT
NAME,
QUALIFIEDNAME,
SIZEBYTES,
SOURCETOTALCOST,
TO_VARCHAR(TO_TIMESTAMP_LTZ(SOURCEUPDATEDAT / 1000), 'YYYY-MM-DD HH24:MI:SS') AS LAST_UPDATED,
TO_VARCHAR(TO_TIMESTAMP_LTZ(SOURCELASTREADAT / 1000), 'YYYY-MM-DD HH24:MI:SS') AS LAST_READ
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
AND (SOURCEUPDATEDAT IS NULL OR SOURCEUPDATEDAT < DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP())))
AND (SOURCELASTREADAT IS NULL OR SOURCELASTREADAT < DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP())))
ORDER BY SOURCETOTALCOST DESC, SIZEBYTES DESC
LIMIT 20;

This query returns tables that:

  • Haven't been updated or read in the last 30 days
  • Are sorted by total cost and size
  • Help prioritize assets for archiving or cleanup

Optimize high-cost queries

Identify assets that generate high query costs relative to their usage frequency. Use this to prioritize optimization efforts, implement caching strategies, or identify underutilized expensive resources.

-- Find tables with high query cost and low read frequency
SELECT
NAME,
QUALIFIEDNAME,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST,
TO_VARCHAR(TO_TIMESTAMP_LTZ(SOURCELASTREADAT / 1000), 'YYYY-MM-DD HH24:MI:SS') AS LAST_READ
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
ORDER BY SOURCEREADQUERYCOST DESC, SOURCEREADCOUNT ASC
LIMIT 20;

Use this query to:

  • Surface cost-heavy queries with low usage frequency
  • Target tables for query optimization, caching, or pruning
-- Find high-activity tables that drive compute costs
SELECT
NAME,
QUALIFIEDNAME,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
AND SOURCEREADCOUNT > 100
AND SOURCEREADQUERYCOST > 100
AND SOURCELASTREADAT >= DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP()))
ORDER BY SOURCEREADQUERYCOST DESC;

Use this query to:

  • Identify high-impact tables driving compute costs in the last 30 days
  • Prioritize optimization or caching strategies for frequently accessed expensive tables

Identify critical analytics assets

Discover frequently queried tables that are central to your analytics workflows. Use this to understand core analytical dependencies, protect critical assets, and confirm proper documentation and governance.

SELECT 
NAME,
QUALIFIEDNAME,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST,
SOURCEREADUSERCOUNT,
POPULARITYSCORE
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
AND SOURCEREADCOUNT > 500
AND SOURCELASTREADAT >= DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP()))
ORDER BY SOURCEREADCOUNT DESC;

Use this query to:

  • Detect central analytical tables with high query volume
  • Protect or monitor key assets more closely
  • Confirm critical tables have proper documentation and governance

Reference

View definition

Create a view that aggregates key table-level cost and usage metrics to simplify your queries:

CREATE OR REPLACE VIEW VW_WAREHOUSE_USAGE AS
SELECT
NAME,
QUALIFIEDNAME,
SIZEBYTES,
SOURCETOTALCOST,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST,
SOURCEREADUSERCOUNT,
SOURCEREADTOPUSERLIST,
SOURCECOSTUNIT,
SOURCEUPDATEDAT,
SOURCELASTREADAT,
SOURCEUPDATEDBY,
SOURCEOWNERS,
POPULARITYSCORE,
CONNECTORNAME
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake';

Adapt the WHERE clause for your connector (for example, 'bigquery', 'databricks').

Column reference

ColumnDescription
NAMEAsset name
QUALIFIEDNAMEFully qualified asset name
SIZEBYTESTable size in bytes
SOURCETOTALCOSTTotal cost of all operations on the table
SOURCEREADCOUNTTotal read query count
SOURCEREADQUERYCOSTCost incurred from read operations
SOURCEREADUSERCOUNTNumber of distinct users who read the table
SOURCEREADTOPUSERLISTList of top users who read the table
SOURCECOSTUNITUnit of cost measurement (for example, credits, USD)
SOURCEUPDATEDATLast update timestamp (epoch in milliseconds)
SOURCELASTREADATLast read timestamp (epoch in milliseconds)
SOURCEUPDATEDBYUser or process that last updated the table
SOURCEOWNERSTable owners or responsible personas
POPULARITYSCOREAtlan popularity metric = distinct users × log(total reads)
CONNECTORNAMESource system name (for example, snowflake, bigquery, databricks)

Next steps

  • Build visual dashboards in Looker, Power BI, or Tableau using MDLH query outputs
  • Automate reporting by scheduling MDLH queries via Snowflake or Databricks
  • Integrate with Atlan AI or MCP tools to surface cost anomalies and performance insights in natural language