Background
This is the seventh part of the series on building highly scalable multi-container apps using AKS. So far in the series we have covered following topics:
- Part 1 : we provisioned a managed Kubernetes cluster using Azure Container Service (AKS).
- Part 2 : Understand basic Kubernetes objects - Kubernetes Namespace.
- Part 3 : Understand basic Kubernetes objects – Pod and Deployment
- Part 4 : Understand Kubernetes object – Service
- Part 5 : Understand Kubernetes Object – init containers
- Part 6 : Manage Kubernetes Storage using Persistent Volume (PV) and Persistent Volume Claim (PVC)
This post is about externalizing the state associated with SQL Server 2017 Linux container to an external volume which we provisioned in the previous post. We will be focussing on following topics during this post:
- Create a StatefulSet Kubernetes object which will be responsible for creating the SQL Server 2017 Linux container
- Externalize the state by means of volume mapping to utilize the storage which was provisioned in the previous post
- Query and update data from SQL Server 2017 Linux container
- Verify data is persisted if the container restarts
- Verify the volume mount using Kubernetes cli
Lets get the ball rolling.
Create a StatefulSet Kubernetes object
Kubernetes provides an object of type StatefulSet which is often used when deploying database workloads that store their state on external storage providers. Statefulsets are used when data loss is unacceptable. StatefulSets provide a stable volume mounts across the restarts of pods or even cluster.
Lets see how we can create a StatefulSet. We will use the manifest file shown below to declaratively set the different properties.
We specify the kind as StatefulSet. Compared to the deployment manifest, there is another difference here. For the StatefulSet, we need to specify the serviceName attribute. The rest of the attributes like metadata and spec are similar to other objects that we have created in the series.
Externalize the state by means of volume mapping
Before we can deploy the statefulset to Kubernetes cluster using kubectl, we need to make couple of other changes to ensure that the volume is mapped correctly. In the same yaml file, scroll towards the bottom and pay attention to the volume mounts and volumes sections.
Lets first look at the volumeMounts section. Here we specify the name for the volume which is mounted as mssqldb. The mountPath specifies which path inside the container is mapped to the volume named mssqldb. The MSSQL Server 2017 Linux container stores the data files inside /var/opt/mssql directory. So we have mapped this path inside the container to an external volume.
Next is the link between the volume and the physical storage which is persisted to the Azure Disk. This association is done in the volumes section. We specify exactly the same name which was assigned to the PVC in previous post, techtalks-data.
The beauty of this is that the container does not know what is the underlying storage provider. In future we can swap the Azure Disk to some other provider and the container need not have to worry about it. As long as the PVC name matches with the claimName, the container will continue to run.
We also expos the service via a load balancer. The other manifest files used during this series can be found in the Github repo. Navigate to the TechTalksDB folder and run the kubectl apply command to create the objects in the Kubernetes cluster. Below are the list of commands I executed to deploy the TechTalksDB, TechTalksAPI and TechTalksWeb containers.
Query and update data from SQL Server container
Lets query the data using SQL Operations Studio. The initialization script contains 3 records. I added one more before taking the screenshot. So at this point of time, we have 4 records in the TechTalk table.
Lets add a new record and query it back.
Verify Data is persisted on container restart
This is where the things start to get interesting. Lets delete the pod which is running the SQL Server 2017 container. We can delete the pod in multiple ways. Earlier in the series I had demonstrated how to interact with Kubernetes objects using the kubectl cli. To keep things simple, lets use the kubernetes control plane to delete the pod using GUI option.
Click on the Delete option as shown above. We will be asked for confirmation, go ahead and confirm the deletion.
Verify data is persisted if the container restarts
As soon as the pod is deleted, the replication controller starts another instance of the pod. Any guess why? Go back to the manifest file and check the number of replicas. We have specified the number of replicas as 1. This is the desired state that we requested Kubernetes to maintain. As soon as the replica count became zero, the controller initiates the creation of new pod to make the actual state to what is desired.
For few seconds while the old pod is terminating and the new one is initiating, we will see some errors.
Give it a few seconds to stabilize and within a minute we can see that things are back to normal with the new pod successfully running.
The pod is restarted, but does that mean we did not loose the data which was persisted? When I queried the table again using SQL operations studio, I got the results with all the 5 rows intact. To save some space, I will not put the same screenshot again. You can give it a try yourself as a home work.
We can inspect the logs of the newly created pod to see what happened in the background.
Verify the volume mount using Kubernetes cli
We can also look at the different events that took place at the cluster level. Run the command kubectl get events –namespace abc2018sg and we can see the list of events. Check the sequence of events in the screenshot which shows what happened after the pod was killed and new one was recreated.
I have highlighted 3 main areas above. First one is the deletion of sql2017 pod. Second is the recreation of the stateful container. And third is the volume mount which is the real game changer. If the volume is not there, we would not be able to recover the data.
Conclusion
During the course of this post we saw how stateful sets can mount external volumes using the persistent volume claims. Data is persisted across container restarts. StatefulSets provide a stable endpoint for the containers which need persistent storage like in the case of database container. By externalizing the data from container into persistent volumes we can ensure that even of the cluster is restarted, we can still be able to mount the volumes back to container and achieve scalability.
All that I demonstrated here is also covered as part of the AKS learning series. The video recording of the series is available on Youtube. Thanks to Engineers.SG who are helping to record this series of talks.
As always, the code used during this post is available on Github.
Until next time, Code with Passion and Strive for Excellence.