Atlan Platform Management

A brief about key components used in the Atlan release process and management

Atlan Release Management

Atlan uses a combination of solutions to provide a seamless release experience. There are four key components of the Atlan release process:

GitHub

Github is used for source code management and version control.

GitHub Registry

Github Registry is used to consume and publish all container images.

Replicated

Replicated is a third-party service to manage on-premise software. Atlan uses Replicated for managing enterprise-grade product releases for all customers.

License Management

Atlan uses per-customer licenses to install applications. Each of these licenses uniquely identifies the customer, defines their release channel, and defines entitlement information about the customer.

Atlan team provides a License URL at the time of the initial deployment, which contains all the information mentioned above.

img

How to update a release

Follow the steps below to update a release.

STEP 1: Log into the release portal

Visit the release portal endpoint and enter the password. For AWS, the release URL and password is available in your Cloud Setup Output.

Log into release portal

STEP 2: Check for updates

Once you log in on the release portal, click on the "Version history" tab in the top navigation bar. Then click on the "Check for updates" button to get the latest release.

Check for updates

If there is a new release available, you will see a new version available in your portal.

Check for new version

STEP 3: Click on "Deploy" to release

To perform a release, click the "Deploy" button on the "Version History" tab.

Click on Deploy to release

At this point, the current cluster will be updated to the new version, and the Deployed status will show on that version.

Deployed status

How to roll back a release

In case of a bug or installation failure for a new release, you can roll back a release by clicking on the Rollback button on the release portal.

img

References

To learn more about releases in Atlan, check out the following articles. ๐Ÿ‘‡

Monitoring and Logging

Atlan has baked-in monitoring and logging to assist in debugging.

img

Here are the components used in Atlan's monitoring and logging:

Prometheus

โ€‹Prometheus is an open-source system monitoring and alerting toolkit.

Atlan deploys Prometheus to fetch metrics from the customer's EKS cluster and push metrics to the Atlan Thanos cluster.

Grafana

โ€‹Grafana allows you to query, visualize, and understand your metrics, no matter where they are stored.

Atlan uses Grafana to visualize metrics fetched from Prometheus. Customers can access Grafana at /services/monitor.

Thanos

โ€‹Thanos is an open-source, highly available Prometheus setup with long-term storage capabilities. We use Thanos to collect metrics from customer deployments where we are providing support.

Alertmanager

The Alertmanager handles alerts sent by customer applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie. It also takes care of silencing and inhibition of alerts. We have integrated Alertmanager with Pagerduty.

PagerDuty

โ€‹PagerDuty helps with alerting and scheduling so Atlan teams are ready to take fast action. The Atlan SRE team receives a call in case of issues with a customer instance.

Logs

Atlan pushes logs to the S3 bucket in the customer AWS account. Atlan stores application requests and audit logs. Customers can access logs by viewing the AWS S3 bucket in their AWS account.

Path for logs: s3://cluster-logs/product-url/yyyy/mm/dd/

Note: Logs are retained for the last 14 days on the standard tier. After that, backups are moved to the cold storage (Glacier). Logs are deleted from cold storage after 90 days.

References

You can refer to the following articles for further information. ๐Ÿ‘‡

Backup Policy

img

Atlan uses an Object Storage service like S3 to store backups based on the following backup strategies.

  • Cluster Backup: Velero is an open-source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes. Atlan uses Velero to keep a full cluster backup based on a defined policy.

  • Database Backup: Custom Kubernetes Jobs are scheduled on a defined policy to back up all databases in the object storage. Lifecycle backups are retained for the last 14 days on the standard tier, and after that backups are moved to the cold storage (Glacier). Backups are deleted from cold storage after 90 days.

References

Check out the following article for further information. ๐Ÿ‘‡