What is Atlan lakehouse?
The Atlan Lakehouse is a data lakehouse that contains metadata and usage data about all of the assets across your data estate, which you can use to build reports and AI applications.
-
Leverage Atlan metadata instantly: Run SQL queries, create reporting dashboards, and power AI/ML applications on your Atlan metadata instantly, without data pipelines or extracts.
-
Open, interoperable foundation: The Atlan Lakehouse is built on Apache Iceberg and uses Apache Polaris (incubating) as its Iceberg REST catalog, so you can use any Iceberg REST-compatible client to query it. Multiple engines can read from the lakehouse concurrently.
-
Autonomous experience: Atlan manages all infrastructure and maintenance for the Lakehouse so you can focus on building metadata-driven dashboards and AI applications. As part of its fully managed experience, Atlan performs regular table maintenance operations (for example, compaction) on the lakehouse to maximize query performance and minimize storage utilization.
Architecture
The Atlan Lakehouse uses Apache Iceberg as its table format and Apache Polaris (incubating) as its Iceberg REST catalog. Underlying data is written using Apache Parquet.
Storage
Iceberg table metadata (for example, metadata files, snapshots, manifest lists, manifest files) and data files for the Lakehouse are stored in an Atlan-managed object storage bucket. Each tenant has a dedicated object storage bucket. All data in the object storage bucket is encrypted at rest using the object storage service's default encryption.
Catalog
The Lakehouse uses Apache Polaris (incubating) as its Iceberg REST catalog. You can use any Iceberg REST-compatible client to query the Lakehouse, including Snowflake, Trino, and Spark. The Iceberg REST catalog is the single entry point for engines to query the Lakehouse.
Infrastructure
All underlying storage and catalog infrastructure for the Lakehouse resides in an Atlan-managed account that matches the Atlan tenant's cloud provider. For example:
- AWS tenants → EKS cluster and Amazon S3 in Atlan's AWS account
- Azure tenants → AKS cluster and Azure Data Lake Storage (ADLS) in Atlan's Azure account
- GCP tenants → GKE cluster and Google Cloud Storage (GCS) in Atlan's GCP account
You need to bring your own Iceberg REST-compatible compute to query the Lakehouse, such as Snowflake, Trino, or Spark.
How it works
Once the Lakehouse for your Atlan workspace has been enabled, all you need to do is to connect your preferred Iceberg REST-compatible client to it, for example, Snowflake, Trino, or Spark. Once connected, you can query the Lakehouse directly, or use your preferred dashboarding tool (for example, Tableau, Power BI, Sigma) to visualise the data in the Lakehouse, just as you do with other data sources in your data warehouse/query engine.
During query time:
- The client submits a query to the Lakehouse catalog.
- The Lakehouse catalog returns the locations of the data files it needs to access to execute the query, and temporary credentials to access the object storage locations where the files reside.
- The client uses the vended credentials to retrieve the files required to process the query.
Autonomous experience
Atlan manages all operational aspects of the Lakehouse so you can focus on building reports and AI applications:
-
Infrastructure: Atlan manages the lifecycle for all of the Lakehouse's underlying infrastructure, including core infrastructure like object storage, Apache Polaris catalog, and Kubernetes, and the data pipelines that feed data into the Lakehouse.
-
Data Maintenance: Atlan performs regular table maintenance operations to maximize query performance and minimize storage utilization. Table maintenance operations include compaction, snapshot expiration, manifest rewriting, and orphan file cleanup. Table maintenance doesn't impact query performance.
See also
- Security: Security model and access controls
- Data reference: Namespaces and entities in the Lakehouse
- Use cases: Common use cases and sample queries