Kubernetes Snapshots by Example

An example of using Kubernetes snapshots.

The examples for this article are available for download. Also, while these examples are specific to Google Kubernetes Engine (GKE), the same concepts apply to other cloud providers, e.g., AWS and Elastic Kubernetes Engine (EKS).

Say we have a Kubernetes workload that requires a large (tens of GB) download (of files) to disk before becoming ready. The natural solution would be to have the workload perform the download. The downside of this approach is the download takes time; delaying the workload’s readiness. In particular, the impact of this delay is multiplied when the workload is being deployed with many replicas. A deployment that takes hours to rollout is generally unacceptable.

If only we had a way of creating a copy of the files without having to download them… Hello snapshots.

You can create snapshots from disks even while they are attached to running instances. Snapshots are global resources, so you can use them to restore data to a new disk or instance within the same project.

— GCP — Working with persistent disk snapshots

Before exploring Kubernetes snapshot resources, one needs to have a basic understanding of Kubernetes storage resources. The article Kubernetes Storage By Example: Part 1 (and 2) provides such an introduction.

The improved solution will be to have a separate workload, producer, that downloads the files to a PersistentVolume backed by a PersistentVolumeClaim. We then dynamically provision Kubernetes snapshot resources from that PersistentVolumeClaim. These Kubernetes snapshot resources are then used as the source for PersistentVolumeClaims used by a consumer workload (one per replica).

Producer

We start by creating a PersistentlyVolumeClaim, data, that dynamically provisions a backing PersistentVolume; here using the Compute Engine persistent disk CSI Driver.

We then create the producer workload that mounts the PersistentVolume backed by the PersistentVolumeClaim.

Finally, we login to the producer workload to download the files; here we simulate this by creating a file: file.txt.

$ kubectl exec -it producer -- /bin/bashroot@producer:/# echo 'Hello World!' > /data/file.txt

Snapshot Resources

Much like a StorageClass, the VolumeSnapshotClass provides the configuration needed when dynamically provisioning snapshot resources. In this example, we indicate that we are using the Compute Engine persistent disk CSI Driver and that the VolumeSnapshotContent object is to be deleted when the VolumeSnapshot it’s bound to is deleted (more on this later).

We then create a VolumeSnapshot of the PersistentVolumeClaim (really of the backing PersistentVolume).

A VolumeSnapshot is a request for snapshot of a volume by a user. It is similar to a PersistentVolumeClaim.

— Kubernetes — Volume Snapshots

Please note: Much like the relationship between PersistentVolumeClaims and PersistentVolumes, VolumeSnapshots are backed by VolumeSnapshotContents.

We can describe the VolumeSnapshot to verify that it is ready:

$ kubectl describe volumesnapshot data-2021-07-15
...
Status:
Bound Volume Snapshot Content Name: snapcontent-04e67a76-d1d3-4541-a57f-96ad52845e45
Creation Time: 2021-07-16T14:33:56Z
Ready To Use: true
Restore Size: 10Gi
...

Demo

Before we get to our consumer workload, we create a simpler example. Here we dynamically provision a PersistentVolume using a PersistentVolumeClaim using the VolumeSnapshot as the source.

After a minute or so, we can confirm that the PersistentVolumeClaim and backing PersistentVolume are created.

$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data Bound pvc-69fb87f1-ceeb-4a50-b7be-4b989ffeac4a 10Gi RWO standard-rwo 47m
demo Bound pvc-c9d2e792-0b85-474c-8c3f-80c261cb7a3f 10Gi RWO standard-rwo 54s

We then create the demo workload that mounts the PersistentVolume backed by the PersistentVolumeClaim.

Finally, we can verify the existence of file.txt on the demo workload:

$ kubectl exec -it demo -- cat /data/file.txt
Hello World!

Consumer

Here we use a StatefulSet that automates the provisioning of PersistentVolumes using a PersistentVolumeClaim template using the VolumeSnapshot as the source.

Please note: StatefulSets require a backing Service.

After a minute or so, we can confirm that three additional PersistentVolumeClaims (names prefixed with data-consumer-2021–07–15) and backing PersistentVolumes are created (one for each StatefulSet replica).

$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data Bound pvc-69fb87f1-ceeb-4a50-b7be-4b989ffeac4a 10Gi RWO standard-rwo 56m
data-consumer-2021-07-15-0 Bound pvc-47058fe7-800c-4e75-9a85-abedcdfd5447 10Gi RWO standard-rwo 3m11s
data-consumer-2021-07-15-1 Bound pvc-ba6ec5a9-0e7a-455c-99d9-ee1f36ebf77e 10Gi RWO standard-rwo 2m27s
data-consumer-2021-07-15-2 Bound pvc-87940c81-62c8-4dde-a5f9-93a2811543fb 10Gi RWO standard-rwo 107s
demo Bound pvc-c9d2e792-0b85-474c-8c3f-80c261cb7a3f 10Gi RWO standard-rwo 9m31s

As with the demo workload, we can verify the existence of file.txt on one of the workload’s replicas:

$ kubectl exec -it consumer-2021-07-15-0 -- cat /data/file.txt
Hello World!

Extra Thing

In the previous scenario, the set of files in the download was static; how would we handle the situation where those files are updated (say daily)?

We might guess that we can simply create a new VolumeSnapshot, e.g., data-2021–07–16, with the updated files and change the StatefulSet to use it; we would be wrong.

The StatefulSet "consumer-2021-07-15" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden

One solution would be to create a new StatefulSet, e.g., consumer-2021–07–16, to use the new VolumeSnapshot (data-2021–07–16). Here we can still use the same backing Service (consumer).

Once the new StatefulSet is ready, we can delete the old StatefulSet (and their PersistentVolumeClaims). We, then, can delete the old VolumeSnapshot.

Ultimately, we would likely automate this entire process; this is outside the scope of this article.

One Last Thing

For the consumer workload, we created the Service backing it as a headless Service, i.e., clusterIP = None; this is common with StatefulSets. Depending on the use-case, e.g., if all the replicas are interchangeable, it is also perfectly acceptable to not set the clusterIP value to allow the Service to have a cluster IP address, internal load balancer, and DNS name.

Broad infrastructure, development, and soft-skill background