Tutorial Highlights & Transcript
00:00 - Kubernetes Deployment Set Up
Here I have a super simple deployment object in Kubernetes. I have set up 20 replicas. I have the strategy commented out for now because I want to show you how the default works. It’s a super simple container. It’s a Node.js application that starts a server because we need it to continue running. I’ll show you the custom things it has as we reach it. So for now, I’m just going to apply this file. Now if we go to our cluster, we see we have our 20 pods starting up, almost done. And just one last one should be ready soon. There we go. If this was a real application serving traffic, then we would need to, at some point, do a deployment to release a new version. Kubernetes comes with a bunch of built-in features that allow us to perform those tasks with zero downtime. If we just make a change and apply the deployment to Kubernetes, Kubernetes itself will just do it as best as possible without any downtime.
01:44 - Running the Kubernetes Deployment
I ran into this specific requirement, right? It was not for a traffic-serving application not for a web application, it was for a back-end worker running in Kubernetes. It doesn’t need to expose a port or anything. It’s just a worker doing stuff in the background. Let’s assume that’s what this is. This is just the backend worker. And again, the same situation, I need to update it. But the thing with workers is that they will always be doing a certain task while they are running. If I tell Kubernetes, I want you to replace version one with version two, it will first send that termination signal to the container. If the container doesn’t exit gracefully in a specified timeout window, then it will just kill it. For workers that might be a problem. In those cases, you might want to implement a graceful termination strategy. So that’s what I’m trying to simulate here.
05:06 - Simulating a Graceful Termination Strategy
Let’s simulate that. I’m going to say that my timeout to terminate is one hour just to make it super obvious. This means I’m giving it one hour to complete this task and if it doesn’t finish, then I’m going to terminate it. Okay, so let’s see if this one is changing. Yeah, it’s changing. So now something different will happen, right? These ones are just finishing up. Now I have configured my graceful termination. And I’m going to then release a new version. Let’s apply the deployment. It is almost done. If you see here we have 39. We have almost twice the capacity. I think it’s not reaching 40 because I don’t have enough nodes to hit 40. Let’s say we have 20 running here, all running correctly. The new version, version three. But because now we have a graceful termination, you can see here we have another 20 that are stuck in terminating. And this one, I built a custom script so they would capture the termination signal and then start to exit at random times. If we take a look at one, and we check the logs, it says it got the termination signal and it’s going to wait eight minutes to simulate that it’s doing something before exiting.
This is the problem I run into. Basically, my workers were finishing up a task, but when we did a new deployment because the termination grace period was so long, we ended up with twice the capacity in the cluster while all the containers finished. That time was variable because we could have some containers running a task that would take them half an hour to finish and others that would take them five minutes. But the problem remains that as the cluster was in this state, it was basically consuming twice the expected capacity. At this scale, it’s not a problem because this is just a simple cluster. I have three nodes here and my deployment is configured for 20. So it’s 20 out of 20. But in our production case, we had like 250 containers. When something like this happens in production, that means we have 250 containers running the new version and close to 250 containers running the old version waiting to finish up their tasks. Because of that jumping capacity, auto-scaling triggered and then we scaled up a bunch of nodes. Then while this old version didn’t finish, we were consuming twice the EC2 instances. Then when this was finally finished with their tasks, we ended up with extra nodes because Kubernetes wouldn’t scale them all the way back down. What we wanted to do was do this same process but in a more controlled manner. That’s where the strategies come in.
10:14 - Configuring Termination Strategies
Let’s do version four, just to keep track and apply it. Now we have our 20 containers starting up. Okay, so now if I apply this, I would expect it to add two more containers. Once those are ready, terminate two, and then repeat, like keep adding two, deleting two, add two, delete two. Let’s take a look at that. And we’re done. We have 40. The same thing happened again. I have 20 terminating ones and I have 20 running ones. The only thing that changed is that it started them two at a time because that’s how I configured it. But we still ended up with twice the capacity running in the cluster. I investigated this a little bit and found out that the deployment object in Kubernetes doesn’t count terminating pods as part of the capacity. As soon as a pod enters the terminating states, it doesn’t get counted toward the deployment. Because of that, it doesn’t count toward this percentage that I have here. I investigated a little bit and found out that there is no way to make it count those terminating pods. I always ended up with the same problem my cluster was running at twice the capacity for like an hour. That’s what we wanted to avoid. That brought me to realize that deployment was not what I wanted to use here. So I’m going to delete this one. Again, force terminate everything to clean it up. What I wanted to do was not possible with a regular deployment.
13:45 - Deploying StatefulSet
Let’s do the same exercise and release a new version. Again, you’ll see the difference. This time, if I ordered them by age, you can see it better. See how it’s only terminating one, and then nothing else is happening. It’s not starting new ones and it’s not terminating more than one at a time. This will take a while. As I mentioned before, this container gets the signal and then decides to wait a random time to simulate this. We will check the logs, we see that this one is going to wait seven minutes. That means that we would be waiting here seven minutes for this one to terminate, and then it would move on to number 17. It will terminate this one and restart another 17 and then continue one by one. That is almost what I wanted to implement. Except that it’s too slow because as I mentioned, in the real production scenario, I have 250. If each of those takes 10 minutes to exit, we still end up waiting a bunch of time. This is almost a feature I want, but it’s too slow. We are going to have to end it with bad news because you remember here on the deployment, we have this configuration. For our deployment object, we can configure a rolling update and configure a max search and a max unavailable. But for a StatefulSet, sadly, we cannot do it yet. This is the best and fastest way. The only thing to improve here on the worker itself is to improve the time it takes it to exit.
The feature will be available. I mean, it’s already available just not on EKS because it’s a Kubernetes 1.24 version feature. And it’s still alpha in that version. EKS right now goes up to 1.23 so we still cannot use this, but 1.24 is supposed to be available on EKS by the end of the year. By then we could test this feature out which allows us to configure a StatefulSet with a max on the available fly. I could configure it instead of going one by one here and terminating one by one, we could configure it to go 10 at a time and do more controlled worker updates, but also faster than with a regular StatefulSet.
Carlos Rodríguez
DevOps Team Lead
nClouds
Carlos has been a Senior DevOps Engineer at nClouds since 2017 and works with customers to build modern, well-architected infrastructure on AWS. He has a long list of technical certifications, including AWS Certified DevOps Engineer - Professional, AWS Certified Solutions Architect - Professional, and AWS Certified SysOps Administrator - Associate.