Atlan uses a combination of solutions to provide a seamless release experience. There are four key components of the Atlan release process:
Github is used for source code management and version control.
Github Registry is used to consume and publish all container images.
Replicated is a third-party service to manage on-premise software. Atlan uses Replicated for managing enterprise-grade product releases for all customers.
Atlan uses per-customer licenses to install applications. Each of these licenses uniquely identifies the customer, defines their release channel, and defines entitlement information about the customer.
Atlan team provides a License URL at the time of the initial deployment, which contains all the information mentioned above.
Follow the steps below to update a release.
Visit the release portal endpoint and enter the password. For AWS, the release URL and password is available in your Cloud Setup Output.
Once you log in on the release portal, click on the "Version history" tab in the top navigation bar. Then click on the "Check for updates" button to get the latest release.
If there is a new release available, you will see a new version available in your portal.
To perform a release, click the "Deploy" button on the "Version History" tab.
At this point, the current cluster will be updated to the new version, and the Deployed status will show on that version.
In case of a bug or installation failure for a new release, you can roll back a release by clicking on the Rollback button on the release portal.
To learn more about releases in Atlan, check out the following articles. 👇
Atlan has baked-in monitoring and logging to assist in debugging.
Here are the components used in Atlan's monitoring and logging:
Prometheus is an open-source system monitoring and alerting toolkit.
Atlan deploys Prometheus to fetch metrics from the customer's EKS cluster and push metrics to the Atlan Thanos cluster.
Grafana allows you to query, visualize, and understand your metrics, no matter where they are stored.
Atlan uses Grafana to visualize metrics fetched from Prometheus. Customers can access Grafana at /services/monitor.
Thanos is an open-source, highly available Prometheus setup with long-term storage capabilities. We use Thanos to collect metrics from customer deployments where we are providing support.
The Alertmanager handles alerts sent by customer applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts. We have integrated Alertmanager with Pagerduty.
PagerDuty helps with alerting and scheduling so Atlan teams are ready to take fast action. The Atlan SRE team receives a call in case of issues with a customer instance.
Atlan pushes logs to the S3 bucket in the customer AWS account. Atlan stores application requests and audit logs. Customers can access logs by viewing the AWS S3 bucket in their AWS account.
You can refer to the following articles for further information. 👇
Atlan uses an Object Storage service like S3 to store backups based on the following backup strategies.
Cluster Backup: Velero is an open-source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes. Atlan uses Velero to keep a full cluster backup based on a defined policy.
Database Backup: Custom Kubernetes Jobs are scheduled on a defined policy to back up all databases in the object storage. Lifecycle backups are retained for the last 14 days on the standard tier, and after that backups are moved to the cold storage (Glacier). Backups are deleted from cold storage after 90 days.
Check out the following article for further information. 👇