Database usage and cost analysis Private Preview
Analyze database usage, optimize query performance, and manage storage and compute costs across your data warehouse platforms using Lakehouse (MDLH).
This page provides SQL templates and guidance for cost and usage optimization across Snowflake, BigQuery, and Databricks.
Before you begin
Before you run these queries, make sure:
- Your Lakehouse setup is complete and Lakehouse is enabled for your organization. If setup isn't complete, see Get started with Lakehouse.
- The Gold layer is set up for your SQL engine. See Set up gold layer in Snowflake. For Databricks (Databricks setup guide) or BigQuery (BigQuery setup guide) setups, contact your Atlan Customer Success team.
You can use SQL queries directly on the TABLE_ENTITY table (and related metadata tables) in the Lakehouse to analyze cost and performance metrics. Below are example queries for common cost optimization scenarios.
Identify stale and unused tables
Find tables that haven't been accessed or updated recently but still incur high storage or compute costs. Use this to identify candidates for archiving, cleanup, or cost optimization.
- Snowflake
SELECT
NAME,
QUALIFIEDNAME,
SIZEBYTES,
SOURCETOTALCOST,
TO_VARCHAR(TO_TIMESTAMP_LTZ(SOURCEUPDATEDAT / 1000), 'YYYY-MM-DD HH24:MI:SS') AS LAST_UPDATED,
TO_VARCHAR(TO_TIMESTAMP_LTZ(SOURCELASTREADAT / 1000), 'YYYY-MM-DD HH24:MI:SS') AS LAST_READ
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
AND (SOURCEUPDATEDAT IS NULL OR SOURCEUPDATEDAT < DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP())))
AND (SOURCELASTREADAT IS NULL OR SOURCELASTREADAT < DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP())))
ORDER BY SOURCETOTALCOST DESC, SIZEBYTES DESC
LIMIT 20;
This query returns tables that:
- Haven't been updated or read in the last 30 days
- Are sorted by total cost and size
- Help prioritize assets for archiving or cleanup
Optimize high-cost queries
Identify assets that generate high query costs relative to their usage frequency. Use this to prioritize optimization efforts, implement caching strategies, or identify underutilized expensive resources.
- Snowflake
-- Find tables with high query cost and low read frequency
SELECT
NAME,
QUALIFIEDNAME,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST,
TO_VARCHAR(TO_TIMESTAMP_LTZ(SOURCELASTREADAT / 1000), 'YYYY-MM-DD HH24:MI:SS') AS LAST_READ
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
ORDER BY SOURCEREADQUERYCOST DESC, SOURCEREADCOUNT ASC
LIMIT 20;
Use this query to:
- Surface cost-heavy queries with low usage frequency
- Target tables for query optimization, caching, or pruning
-- Find high-activity tables that drive compute costs
SELECT
NAME,
QUALIFIEDNAME,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
AND SOURCEREADCOUNT > 100
AND SOURCEREADQUERYCOST > 100
AND SOURCELASTREADAT >= DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP()))
ORDER BY SOURCEREADQUERYCOST DESC;
Use this query to:
- Identify high-impact tables driving compute costs in the last 30 days
- Prioritize optimization or caching strategies for frequently accessed expensive tables
Identify critical analytics assets
Discover frequently queried tables that are central to your analytics workflows. Use this to understand core analytical dependencies, protect critical assets, and confirm proper documentation and governance.
- Snowflake
SELECT
NAME,
QUALIFIEDNAME,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST,
SOURCEREADUSERCOUNT,
POPULARITYSCORE
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake'
AND SOURCEREADCOUNT > 500
AND SOURCELASTREADAT >= DATE_PART(EPOCH_MILLISECOND, DATEADD(DAY, -30, CURRENT_TIMESTAMP()))
ORDER BY SOURCEREADCOUNT DESC;
Use this query to:
- Detect central analytical tables with high query volume
- Protect or monitor key assets more closely
- Confirm critical tables have proper documentation and governance
Reference
View definition
Create a view that aggregates key table-level cost and usage metrics to simplify your queries:
CREATE OR REPLACE VIEW VW_WAREHOUSE_USAGE AS
SELECT
NAME,
QUALIFIEDNAME,
SIZEBYTES,
SOURCETOTALCOST,
SOURCEREADCOUNT,
SOURCEREADQUERYCOST,
SOURCEREADUSERCOUNT,
SOURCEREADTOPUSERLIST,
SOURCECOSTUNIT,
SOURCEUPDATEDAT,
SOURCELASTREADAT,
SOURCEUPDATEDBY,
SOURCEOWNERS,
POPULARITYSCORE,
CONNECTORNAME
FROM TABLE_ENTITY
WHERE CONNECTORNAME = 'snowflake';
Adapt the WHERE clause for your connector (for example, 'bigquery', 'databricks').
Column reference
| Column | Description |
|---|---|
NAME | Asset name |
QUALIFIEDNAME | Fully qualified asset name |
SIZEBYTES | Table size in bytes |
SOURCETOTALCOST | Total cost of all operations on the table |
SOURCEREADCOUNT | Total read query count |
SOURCEREADQUERYCOST | Cost incurred from read operations |
SOURCEREADUSERCOUNT | Number of distinct users who read the table |
SOURCEREADTOPUSERLIST | List of top users who read the table |
SOURCECOSTUNIT | Unit of cost measurement (for example, credits, USD) |
SOURCEUPDATEDAT | Last update timestamp (epoch in milliseconds) |
SOURCELASTREADAT | Last read timestamp (epoch in milliseconds) |
SOURCEUPDATEDBY | User or process that last updated the table |
SOURCEOWNERS | Table owners or responsible personas |
POPULARITYSCORE | Atlan popularity metric = distinct users × log(total reads) |
CONNECTORNAME | Source system name (for example, snowflake, bigquery, databricks) |
Next steps
- Build visual dashboards in Looker, Power BI, or Tableau using MDLH query outputs
- Automate reporting by scheduling MDLH queries via Snowflake or Databricks
- Integrate with Atlan AI or MCP tools to surface cost anomalies and performance insights in natural language