Troubleshooting usage and popularity metrics
Atlan currently supports usage and popularity metrics for the following connectors:
- Amazon Redshift: tables, views, and columns. Expensive queries and compute costs for Amazon Redshift assets are currently unavailable due to limitations at source.
- Databricks: tables, views, and columns. Expensive queries and compute costs for Databricks assets are currently unavailable due to limitations of the Databricks APIs.
- Google BigQuery: tables, views, and columns
- Microsoft Power BI: reports and dashboards
- Snowflake: tables, views, and columns
- Tableau: workbooks, dashboards, worksheets, published data sources, and flows. Popularity is sourced from Tableau Cloud during the standard crawl. No miner is required. Tableau Server, JWT bearer authentication, and unique-viewer counts aren't supported.
Can any user set this up?
There is no separate setup required. For Amazon Redshift, Databricks, Google BigQuery, Microsoft Power BI, and Snowflake, it's bundled with the respective miner packages. For Tableau, it's bundled with the standard Tableau crawler—no miner is required.
Which editions of Snowflake are supported?
The following Snowflake editions are supported:
- Standard Edition
- Enterprise Edition
- Business Critical Edition
- Virtual Private Snowflake (VPS)
Do account-level permissions need to be modified for setup?
No extra permissions are required.
- For enterprise customers, Atlan uses the
ACCESS_HISTORYtable to determine which objects have been accessed, keeping the data source as the source of truth. - For other customers, Atlan uses internal logic to determine the same.
How can I set up popularity on my instance?
First-time setup:
- Head over to the Atlan marketplace and set up the miner package for a supported connector with a daily frequency (recommended).
- On the first run, popularity is calculated from the start date of the miner with a default window of 14 days.
- For all subsequent runs, the popularity window is increased to a maximum limit of 30 days.
For preconfigured miners:
- The next run of the miner migrates the last 30 days worth of query data and calculates popularity for 30 days (if available).
- Subsequent runs work as expected.
How's popularity score calculated?
The computation of popularity score is based on the number of read queries that used the data asset and the number of distinct users executing the read queries. Values are collected over a period of 30 days.
Popularity score = number of distinct readers * log (total number of read queries)
What asset types have popularity scores?
Popularity scores are currently available for all Amazon Redshift, Databricks, Google BigQuery, and Snowflake tables, views, and columns, Microsoft Power BI reports and dashboards, and Tableau workbooks, dashboards, worksheets, published data sources, and flows.
How's compute cost estimated per asset?
Amazon Redshift
Compute cost for Amazon Redshift assets are currently unavailable due to limitations at source.
Databricks
Compute cost for Databricks assets are currently unavailable due to limitations of the Databricks APIs.
Google BigQuery
If the Google BigQuery miner is configured to use:
- On-demand pricing: the bytes field indicates the total bytes billed for the query.
- Flat-rate pricing: the slot-ms (slot-milliseconds) field indicates the total slot utilization for the query. For example, if a 20-second query is continuously consuming 4 slots, then the query has utilized 80,000 slot-ms.
In Atlan, the compute cost of a Google BigQuery asset is estimated based on each individual query. For example:
- Query A ran on asset x = 10 slot-ms or bytes
- Query B ran on asset x = 15 slot-ms or bytes
- Estimated compute cost for asset x = 10 + 15 = 25 slot-ms or bytes
Learn more about Google BigQuery pricing.
Snowflake
In Atlan, the compute cost of a Snowflake asset is estimated based on each individual query. For example, if warehouse X-small costs 1 credit per hour:
- Query A ran from
1 p.m.to5 p.m.on warehouseX-small= 4 credits - Query B ran from
11 a.m.to3 p.m.on warehouseX-small= 4 credits - Estimated compute cost: 4 credits + 4 credits = 8 credits
Can queries run by ETL users be excluded?
Atlan supports excluding queries by users. You can exclude users while configuring the miner behavior for all supported connectors: Amazon Redshift, Databricks, Google BigQuery, Microsoft Power BI, and Snowflake.
Why are metrics missing even after running miner?
Usage and popularity metrics are computed using both the number of queries and the number of users who have queried that asset in the last 30 days. If an asset hasn't been queried in the last 30 days, there are no usage and popularity metrics to report for that asset.
If an asset has been queried and you've run the miner but metrics are still missing, then it may be due to the miner logic at work.
When setting up the miner for the first time, you need to provide a start date ranging from the last two days up to past two weeks of query history. If an asset hasn't been queried during the selected time period, usage and popularity metrics are unavailable.
For subsequent runs, the miner fetches query history based on the following logic:
START_TIME ≤ CURRENT_DATE - INTERVAL '1 DAY'
For example, the miner logic for January 23 is:
Jan 22 5 p.m. ≤ Jan 23 00:00- 1 dayJan 22 5 p.m. ≤ Jan 22 00:00
The miner doesn't fetch the data for the previous day (January 22) on the current day (January 23). Atlan requires a minimum lag of 24 to 48 hours to capture all the relevant transformations that were part of a query session.
What type of queries do you show for popular, slow, or expensive queries?
Only read queries or SELECT statements are shown for popular, slow, and expensive queries. DDL and DML statements aren't supported.