Atlan is a fully virtualized solution that does not involve moving data from existing storage layers.
Atlan crawls metadata from upstream data sources and stores it within the secure VPC ((virtual private cloud - either the customer's or Atlan managed). Any queries that are run in Atlan are pushed to existing processing layers (e.g. directly to your database, warehouse, or a processing layer such as Athena or Presto on top of blob storage).
Data and metadata collected and created by Atlan are stored in applications and databases within the VPC. This includes the data for the quality profiles, asset metadata, and user data.
Atlan gives users the ability to see sample data previews for a data asset as well as the results for any queries run on Atlan.
In both cases, the request is pushed upstream to the data source, and a 100-row sample of the result is shown to Atlan users. Atlan caches this 100-row sample of the data in Redis for a faster load time when you return to a data preview or query. This cache can be disabled if needed if customers don't want any data to be stored in Atlan.
Users can generate data quality metrics with the click of a button in Atlan. Once generated, these metrics are stored in PostgreSQL in the VPC. These metrics are again metadata to describe the accuracy, completeness, structure, and quality of your data.
Asset metadata, including metadata crawled by Atlan or data lineage generated on the product, is stored across Apache Atlas, Elasticsearch and Cassandra.
Atlas is a graph database layer that stores entity relationships and attributes. Elasticsearch is used to optimize search on the product, and Cassandra acts as a persistence backend.
Data on users, roles, and groups is stored in a PostgreSQL database. Keycloak uses this information for access and identity management.
All sensitive fields like passwords are hashed and stored. Any user data transmitted over the internet is SSL-encrypted over HTTPS.
The Atlan authentication process is run on Keycloak, using open protocol standards like username-password or SAML 2.0–based login. Atlan can also integrate into organizations’ existing SAML 2.0–based SSO authentication systems.
For centralized management of groups and users, Atlan has granular access policies connected to every action on the product. Access can be provided or revoked through simple steps in the "Access" tab.
Admins and Stewards can define policies for every action and asset on Atlan. Allow or deny view/edit access to assets ranging from databases down to columns. Organizations can even build policies based on asset classification, which opens up the ability to restrict access to automatically detected Personally Identifiable Information — an essential feature in the GDPR era.
Access on Atlan is built on a hierarchical system, such that Admins have the flexibility to provide access to all the assets within a database or a schema or only provide access to a single table or column, depending on the use case.
Access can be provided to just view a dataset or to collaborate on it (i.e. edit access).
Atlan reads and automatically tags columns that contain Personally Identifiable Information (e.g. names, email addresses) with a PII classification.
Access policies can be built out on top of this classification, allowing organizations to restrict access to and maintain controls over all PII data.
On Atlan, organizations can set up customized Classifications, which can be associated with data assets like tables and propagated downstream to any columns or tables created from them.
As with PII-classified assets, organizations can add access controls based on these classifications to restrict their availability.
Atlan is deployed using Kubernetes on the customer’s VPC. All access to the Kubernetes control plane is not allowed publicly on the internet and is controlled by network access control lists restricted to the set of IP addresses needed to administer the cluster.
Nodes are configured to only accept connections (via network access control lists) from the control plane on the specified ports, and accept connections for services in Kubernetes of type NodePort and LoadBalancer.
Each component of the Kubernetes cluster has security measures configured. These security measures are at the following levels:
Data encryption in transit
Atlan has built-in monitoring systems that help users manage the behind-the-scenes infrastructure while ensuring adherence to the highest standards of security.
Using Grafana, an open source analytics and monitoring solution, all Admin users on Atlan will have complete visibility around CPU, memory, and storage metrics through industry-standard dashboards.
Slack alerts and notifications can be enabled across Atlan infrastructure services for proactive alerting.
Atlan uses standard encryption to protect data in transit. Atlan uses Hypertext Transfer Protocol Secure (HTTPS) for encrypted and secure communication when data is in transit. This protocol is encrypted using Transport Layer Security (TLS). We also support Two-Factor Authentication (2FA) for accessing resources.
To ensure the highest standards of security, Atlan has adopted global industry standards in security practices and solutions. These include:
Vulnerability management through frequent releases: Atlan makes frequent, weekly releases to minimize vulnerability at a product and operating system level.
Application Penetration Testing (APT): Atlan is working with AppSecure to conduct industry standard APT. A penetration test is an authorized simulated cyberattack on a computer system, performed to evaluate the security of the system. The test is performed to identify both weaknesses (including the potential for unauthorized parties to gain access to the system's features and data) and strengths, enabling a full risk assessment to be completed.
Event logging and monitoring: Atlan has numerous tools to support monitoring and event logging — Prometheus and Grafana for monitoring, and Fluentd and Loki for event logging.