Understand collections
A collection is a curated group of data assets that share a common characteristic. Collections are the unit of work in Context Agents Studio—you track metadata coverage per collection and run context agents on the assets within them.
The key insight is that you don't define collections manually. Atlan builds them automatically from signals that already exist in your data ecosystem—query activity, lineage relationships, and BI usage—so the assets surfaced are the ones that matter most to your organization right now.
Default collections
When you first open Context Agents Studio, Atlan autocreates recommended collections, each representing a different lens on asset importance:
- Popular SQL Assets: Tables and views in Snowflake, Databricks, and BigQuery that have been frequently queried in the last 30 days. These are the assets your analysts, engineers, and data scientists rely on daily.
- Popular BI Reports: BI reports with significant user activity in the last 30 days. When your most-viewed reports are well-documented, teams can quickly understand what they're looking at and trust the data behind it.
- Gold Layer: SQL assets that directly feed your BI reports, the production-ready tables and views at the intersection of your data pipeline and business consumption.
- Upstream of Popular BI: SQL assets feeding your most popular BI reports. Enriching these builds context not just at the reporting layer but across the full lineage chain that powers your most important dashboards.
- DQ Connected Assets: Assets that already have data quality rules attached through connected DQ tools. Enriching their metadata alongside existing quality coverage creates a complete picture of data health and context.
- Assets with Owners: Assets that have assigned owners, helping you focus enrichment on assets that already have some governance in place.
- Assets with Product or Domain: Assets associated with a specific product or domain, surfacing the data that maps to your organization's business structure.
- Assets with Terms: Assets that have linked glossary terms, letting you enrich assets that already have some business context attached.
How collections are populated
Collections are automatically populated using usage signals: query logs, lineage relationships, and source metadata. Each collection draws on a different signal from Atlan's underlying analytics:
- Query activity from your data warehouse drives Popular SQL Assets
- Report usage from your BI connectors drives Popular BI Reports
- Lineage relationships between SQL sources and BI reports drive Gold Layer and Upstream of Popular BI
- Ownership, domain, and term assignments drive Assets with Owners, Assets with Product or Domain, and Assets with Terms
- Data quality connections drive DQ Connected Assets
Collections refresh periodically to reflect the current state of your data ecosystem. You don't need to configure or maintain them.
Metadata coverage tracking
For each collection, Context Agents Studio tracks a coverage % per metadata attribute: the percentage of assets in the collection that have that attribute filled in. This lets you immediately see where gaps are largest and prioritize enrichment accordingly.
Tracked attributes include:
- Description: whether the asset has a description
- Owners: whether the asset has assigned owners
- Certification: whether the asset has been certified
- README: whether the asset has a README attached
- Terms: whether the asset has linked glossary terms
- Tags: whether the asset has been tagged
You can customize which attributes are tracked per collection by clicking + Track metadata in the collection detail view.
Enrichment limits
Each collection has an enrichment limit of 500 assets per agent run. When a collection contains more than 500 assets, the first 500 are selected based on popularity. If popularity data isn't available, assets are selected alphabetically.
Custom collections
Custom collection creation is coming soon. This lets teams create their own groupings of assets using tags and custom metadata, enabling more fine-grained, domain-level groupings—for example, tagging a set of high-priority assets and creating a collection from that tag to focus enrichment efforts.
See also
- Atlan Lakehouse: The underlying platform that Context Agents Studio runs on
- Gold Layer reference: Detailed reference for the SQL assets in the Gold Layer
- Context agents: What each agent generates and which asset types it supports
- Trigger AI enrichment: How to run agents on a collection
- FAQ - Metadata enrichment: Source requirements, collection availability, and coverage questions