Backups

How Atlan runs backup jobs and where they are stored

Elasticsearch

To create an Elasticsearch backup, we use a snapshot repository via the Elasticsearch API. It has the S3 bucket, region, and IAM role, which are required for the backup process.

In brief, a CronJob hits an API endpoint of Elasticsearch, which creates a snapshot of the current state of the Elasticsearch cluster (i.e. all the indices). This snapshot is uploaded to an S3 bucket, the one specified in the snapshot repository.

This CronJob runs at 03:00 UTC every day. The backups are stored on the S3 bucket under the path /es-backup/.

To get the logs of a CronJob run, use this command:

kubectl logs atlas-elasticsearch-master-backup-XXXX -n atlas

๐Ÿง™โ€โ™€๏ธ Remember: The backup process is incremental, i.e. the new snapshot will build upon the previous one.

Cassandra

We use a utility called Cain to backup and restore Cassandra. This utility performs backup and restoration functionalities for Cassandra in Kubernetes. You can find out more about Cain here.

We have a Kubernetes CronJob running, which uses a "nuvo/cain" image to launch a pod, create a snapshot of the Cassandra cluster and data, and then upload that snapshot to the specified S3 bucket under the specified path (e.g. s3://development-12451532/backup/cassandra/).

This CronJob runs at 03:00 UTC every day. The backups are stored on the S3 bucket under the path /backup/cassandra/.

To get the logs of a CronJob run, use this command:

kubectl logs atlas-cassandra-backup-atlas-XXXX -n atlas

Postgres

We use a custom container image, which takes the S3 bucket name as input and takes the dump of Postgres (all databases), compresses it, and stores in the S3 bucket under the path /postgres/.

We have a CronJob that uses this image to launch a container and perform the backup. This CronJob runs at 03:00 UTC every day. The backup file at S3 is named postgres-backup-<date and time at which the file was created> (e.g. postgres-backup-2021-02-02_05-58-24.gz).

To get the logs of a CronJob run, use this command:

kubectl logs postgresql-postgresql-master-backup-XXXX -n postgresql

Cluster backup using Velero

Velero is an open source tool to safely backup and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes. We have set schedules to automatically kick off backups at recurring intervals.

Along with Velero, we have integrated Restic to create snapshots of all the persistent volumes present in the Kubernetes cluster.

These backups are created at 03:00 UTC. The backups and snapshots created by Velero and Restic are stored on the backup S3 bucket under the path /atlan/ (e.g. s3://development-12451532-backup/atlan/).

To check the logs of Velero, use this command:

kubectl logs velero-XXXX -n velero

To check the logs of Restic, use this command:

kubectl logs restic-XXXX -n velero