Kubernetes Health Checks: A Guide to Probes

As we journey through the rapidly developing field of digital technology, ensuring the health and resilience of your applications has become not just an option but a necessity—especially when managing complex infrastructures such as Kubernetes.

Kubernetes, the industry-leading container orchestration tool, offers mechanisms to implement robust health checks for your applications. These checks, termed “probes,” act as guardians of your application’s well-being, continuously monitoring the health status of your pods and their hosted applications. This provides a more streamlined, automated, and reliable system.

Why Should You Monitor Application Health?

Monitoring your application’s health is crucial for maintaining system stability, efficiency, and user satisfaction. No application is immune to potential errors or issues—whether it’s unexpected server downtime, performance bottlenecks, a sudden influx of user traffic, or unpredictable runtime errors. These glitches can disrupt the smooth operations of your applications and might lead to significant delays or losses. Thus, monitoring is essential, regardless of whether your application is hosted on Kubernetes or another platform.

By closely monitoring your applications’ health, you can ensure users experience a seamless and reliable service. But the advantages extend beyond that. Monitoring provides data-driven insights, facilitating informed decision-making and responsive adjustments to changes in your application’s environment. It keeps you informed about your application’s usage patterns and offers insights into user demand, helping you decide when to scale your application infrastructure in response to usage peaks.

Furthermore, monitoring enables automated responses to specific scenarios, such as deploying additional instances when application loads increase or replacing malfunctioning instances.

Staying proactive, adapting to demand, and delivering a consistent, high-performing, user-friendly application are pillars of successful application management.

How Can Probes Help?

In my experience, utilizing health checks has significantly enhanced the reliability of my applications in Kubernetes. I’ve been proactive in addressing issues before they escalate, maintaining high availability even amidst potential disruptions. The analogy I like is that of a good insurance policy: you hope you’ll never require it, but when something goes awry, you’re grateful for its presence.

While I might not have experienced a catastrophic system failure due to absent health checks, the smooth operation of my applications can be credited largely to these safeguards. They might not be center stage when everything progresses without hitches, but their foundational work is what ensures this uninterrupted flow. For instance, when I used Mongoose in my Node app to connect with a MongoDB database, I faced a setup duration before the database was fully operational. I then configured a readiness probe to ensure my application wouldn’t process traffic until the database connection was securely established. Additionally, I incorporated a startup probe to account for potential network lags that might prolong the connection phase.

Kubernetes inherently offers an automated self-healing feature, enhancing its efficacy in managing containerized applications. This innate function empowers Kubernetes to autonomously detect and rectify issues within pods, reboot an unresponsive pod, deliver rolling updates to merge application enhancements seamlessly with zero downtime, and even roll back to a prior stable version of your application if deemed necessary.

While the self-healing abilities of Kubernetes ensure a degree of resilience, layering on additional monitoring can bolster your system even more. This is where Probes become pivotal.

Probes supply Kubernetes with an in-depth grasp of your application’s health, granting the context necessary for informed decision-making and appropriate actions. With probes, Kubernetes gains insights not merely at the pod or container level but also regarding the specific state of individual applications housed within those containers.

Probes enhance Kubernetes’ self-repair features and are categorized into three types: liveness probes, readiness probes, and startup probes. The following is a list of configuration options applicable to each probe:

Configuration Option	Description	Default value
initialDelaySeconds	Specifies the number of seconds to wait before a probe is initiated.	0
periodSeconds	Defines the frequency (seconds) of performing probes	10
timeoutSeconds	Defines the maximum time (seconds) a probe should take to complete.	1
successThreshold	Minimum consecutive successes for the probe to be considered successful after having failed.	1
failureThreshold	Defines the number of probe failures before Kubernetes gives up; in the case of liveness probes, this means restarting the pod.	3

Each probe performs one of the following actions at the container level in the pod specification:

httpGet: This action executes an HTTP request for a health check. The response code should be within the200to399range.
tcpSocket: This action opens a TCP socket for a health check. The pod is healthy if a connection can be established.
exec: This action executes a command inside the container for a health check. The pod is healthy if the command returns with exit code 0, otherwise unhealthy.

You can visit the Kubernetesofficial documententationto find out more about how and when to configure these actions.

With this, we can now get into the practical use case of probes.

Using Probes in Kubernetes

Understanding the theory of probes in Kubernetes is one thing, applying this knowledge to enhance your deployments is another. This guide will bridge this gap, by taking you beyond theoretical knowledge of probes in Kubernetes to practical applications that can enhance your deployments.

You’ll gain a comprehensive understanding of these probes, their capabilities, and their practical applications.

This knowledge will help you enhance the resilience and responsiveness of your applications and fully utilize the potential of Kubernetes for seamless scaling and smart management of containerized workloads.

Prerequisites:

To effectively utilize this tutorial, you’ll need a text editor,Kubernetesandkubectlinstalled on yourWindows,LinuxorMacmachine and an operational Kubernetes cluster – either locally or cloud-based, with at least one node. For a local cluster, options likeMinikubeorKindwill work fine, providing straightforward and efficient cluster provisioning.

On the other hand, for cloud-based cluster setups, services likeAmazon’s EKS(Elastic Kubernetes Service),Google’s GKE(Google Kubernetes Engine), orMicrosoft’s AKS(Azure Kubernetes Service) can be leveraged.

Regardless of your choice, ensure your cluster is up and running smoothly before proceeding. This will ensure you can readily apply the concepts and techniques demonstrated in this tutorial to your Kubernetes environment.

Configuring the Liveness Probe

The Liveness Probe is a mechanism employed by Kubernetes to determine if an application within a container is still running. If the liveness probe fails, Kubernetes will automatically restart the container. This feature is particularly beneficial for applications such as web servers, microservices, messaging, and database systems, where a simple restart can rectify many issues that may otherwise cause the application to crash.

To implement the Liveness Probe, we define aliveness command, aTCP socket, or anHTTPrequest that Kubernetes can use to check the health of the application.

Here is an example of a Liveness Probe that runs a command inside a container:

livenessProbe: exec: command: - cat - /tmp/health initialDelaySeconds: 5 periodSeconds: 5

The liveness probe in the above example operates as follows:

Executes thecat /tmp/healthcommand inside the container.
Uses theinitialDelaySecondsfield to specify the duration to wait before conducting the first check.
Defines the frequency of the health checks of the application with theperiodSecondsfield.

Let’s consider a more practical scenario, where we’ll deploy a Node.js container image with a memory leak. Memory leaks can degrade performance over time until the application becomes completely unresponsive.

In this scenario, we’re going to deploy this container image and equip it with a liveness probe. The objective here is to observe the liveness probe in action as it detects the memory leak automatically and restarts the application, thereby resolving the issue by clearing the memory temporarily and giving the system administrator ample time to identify and rectify the root cause of the memory leak permanently.

Create ayamlfile callednode-appand paste in the following configuration settings:

# node-app.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: node-appspec: replicas: 3 selector: matchLabels: app: node-app template: metadata: labels: app: node-app spec: containers: - name: nodejs-app image: mercybassey/memory-leak ports: - containerPort: 3000 livenessProbe: httpGet: path: /health-check port: 3000 initialDelaySeconds: 15 periodSeconds: 5

This configuration setting above will create a Kubernetes deployment object, set to launch a Node.js application designed to simulate a memory leak issue when subjected to a high volume of requests. The deployment consists of three replica pods, each running the Docker image named “mercybassey/memory-leak”.

The liveness probe associated with each pod sends HTTP GET requests to the “health-check” endpoint on port3000of the container, starting15seconds after the pod launches and repeating every5seconds thereafter.

Should the Node.js application become unresponsive (and hence, the liveness probe fails), Kubernetes will take a corrective action by automatically restarting the container which will temporarily handle the memory leak until a permanent fix is implemented in the application’s code.

To create this deployment, execute the followingkubectlcommand:

kubectl apply -f node-app.yaml

Kubernetes Health Checks: A Guide to Probes - Semaphore (1)

Run thekubectlcommands to view the deployment and the pods created by the deployment:

kubectl get deploymentkubectl get pods

Kubernetes Health Checks: A Guide to Probes - Semaphore (2)

Next, we need to expose the “node-app” deployment via a Kubernetes service object to facilitate HTTP GET requests to the pods.

Create a “yaml” file named “service.yaml” with the following configuration:

# service.yamlapiVersion: v1kind: Servicemetadata: name: nodejs-servicespec: selector: app: node-app type: NodePort ports: - port: 3000 targetPort: 3000 nodePort: 30000

This configuration will create a Kubernetes service object of theNodePorttype, thereby exposing the “node-app” deployment on port3000.

To create this service object, run the followingkubectlcommand:

kubectl apply -f service.yaml

Run the followingkubectlcommand to view the service:

kubectl get services

Kubernetes Health Checks: A Guide to Probes - Semaphore (3)

To observe the liveness probe in action, we need to direct a large volume of HTTPS GET requests to the “/health-check” route. This can be accomplished using a load testing tool or HTTP benchmarking tool such asSiegewhich we will deploy on our Kubernetes cluster.

Create a “yaml” file named “siege” with the following code:

# siege.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: siegespec: replicas: 1 selector: matchLabels: app: siege template: metadata: labels: app: siege spec: containers: - name: siege image: dockersec/siege command: ["siege", "-c", "40", "-t", "1M", "http://10.245.221.142:3000/"]

Considering the code above, a Kubernetes deployment object will be created using the “dockersec/siege” image, which is a pre-built image for the Siege load testing tool. The command specified in the command field tells Siege to run a load test with40concurrent users (-c 40) for 1 minute (-t 1M) against the specified URL (http://10.245.221.142:3000/) which is in this case theNodePortservice exposing our Node.js deployment.

Now use the followingkubectlcommand to create and view this deployment:

Configuring the Readiness Probe

Thereadiness probeis a functionality provided by Kubernetes to determine if a pod is ready to accept requests or, in other words, ready to serve traffic. When a pod is not ready, it is temporarily removed from the service load balancers to prevent it from receiving traffic. This feature is particularly useful for applications that have a significant initialization time or for applications that dynamically manage their readiness based on internal factors.

To demonstrate how a readiness probe works, let’s deploy a Node.js application that simulates a cache check operation for10seconds at startup. The application isn’t ready to serve traffic until this cache check is complete.

We will use a “yaml” file called “cache” with the following configuration settings:

# cache.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: node-cache-appspec: replicas: 3 selector: matchLabels: app: node-cache-app template: metadata: labels: app: node-cache-app spec: containers: - name: nodejs-app image: mercybassey/cache ports: - containerPort: 3000 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 10 periodSeconds: 5

The configuration setting above is expected to create3replicas of the application namednode-cache-app. The application runs in a container using themercybassey/cacheimage and listens on port3000.

The readiness probe is configured to make an HTTP GET request to the/readyendpoint of the application every5seconds (periodSeconds: 5), but it will wait for10seconds (initialDelaySeconds: 10) before performing the first probe.

The configuration aims to ensure that the application is ready to serve requests before it gets traffic.

If the application takes some time to initialize (for example, to load data into the cache or establish database connections), the readiness probe gives it the time it needs. Kubernetes will only start sending traffic to this pod once the readiness probe gets a successful response from the/readyendpoint.

If, for any reason, the application becomes unready (for example, if it needs to refresh its cache or reconnect to the database), the readiness probe will fail, and Kubernetes will stop sending traffic to that pod until it becomes ready again.

This means that Kubernetes will not send any traffic to a newly created pod from this deployment until at least10seconds have passed and the application returns a successful response for the readiness probe. From then on, the probe is checked every5second.

This approach helps in scenarios where your application might need a few moments after starting to fully initialize and be ready to receive traffic. By using the readiness probe this way, you can ensure that Kubernetes only routes traffic to your pod once it’s fully ready to handle it, thereby potentially avoiding any failed requests that might occur if traffic was sent to the pod too soon.

To create the deployment and view its status use the followingkubectlcommands:

kubectl apply -f cache.yamlkubectl get deployments

Kubernetes Health Checks: A Guide to Probes - Semaphore (11)

You can view the pods created by this deployment and their readiness status with the followingkubectlcommand:

kubectl get pods

Kubernetes Health Checks: A Guide to Probes - Semaphore (12)

Initially, the readiness status of the pods shows0/1, indicating that the containers in the pods are not ready yet to serve traffic. However, after the cache check operation is complete (after about10seconds), the readiness status changes to1/1, signifying that the application is now ready to serve traffic.

Kubernetes Health Checks: A Guide to Probes - Semaphore (13)

This is basically how the readiness probe works. However, unlike the liveness probe above, if this probe fails the Pod is not restarted – it simply will not be sent any traffic until it’s ready.

Configuring the Startup Probe

Just as the name implies, this probe checks if an application within a container has successfully started. If the startup probe fails, Kubernetes assumes that your application is still starting and waits for a while. If the startup probe passes, then any configured liveness and readiness probes will come into effect.

To demonstrate this, we will deploy a Node container image with a long startup time. This container image comprises two routes – the home/route and the/healthroute. Based on how this container image is designed, once it is run, it starts after30seconds. And until it starts every other route or part of the application is unaccessible.

To deploy this application create a “yaml” file called “startup” with the following configuration settings:

# startup.yamlapiVersion: apps/v1kind: Deploymentmetadata: name: node-appspec: replicas: 3 selector: matchLabels: app: node-app template: metadata: labels: app: node-app spec: containers: - name: nodejs-app image: mercybassey/startup ports: - containerPort: 3000 startupProbe: httpGet: path: / port: 3000 failureThreshold: 30 periodSeconds: 1 livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 45 periodSeconds: 5

This Kubernetes Deployment defines a set of3Pods, each running a Node.js application served from the Docker image referenced asmercybassey/startupexposed and expected to run on port3000.

For each pod, a startup probe is defined that sends an HTTP GET request to the path/on port3000of the pod. This probe is sent every1second (periodSeconds: 1) until a success response is returned. If the probe does not get a success response within30attempts (failureThreshold: 30), Kubernetes will consider the startup probe as failed and will kill the container and start it again. This gives the application30seconds to start up successfully before Kubernetes decides that the pod failed to start.

After the application has started and the startup probe has passed, the liveness probe kicks in. The liveness probe also sends an HTTP GET request, but it sends it to the “/health” endpoint on port3000. However, this probe does not start until45seconds after the container has started (initialDelaySeconds: 45) and then continues to check the health of the application every5seconds (periodSeconds: 5).

If the “/health” endpoint returns a non-success response, Kubernetes will consider the liveness probe as failed. By default, if the liveness probe fails3times in a row (which is the failureThreshold, defaulted to3), Kubernetes will consider the pod to be unhealthy and will kill and restart it.

So based on these configuration settings, the startup probe gives the Node application30seconds to start up, after which the liveness probe gives it a 45-second grace period before it starts monitoring the health of the application. After the grace period, the liveness probe checks the health of the application every5seconds.

Now execute the following command to create and view this deployment:

kubectl apply -f startup.yamlkubectl get deployments

Kubernetes Health Checks: A Guide to Probes - Semaphore (14)

From the output above, you can see that the containers in the pods haven’t started yet, hence it says ‘0/3’.

You can also confirm that the pods also have a ready state of0/1:

Kubernetes Health Checks: A Guide to Probes - Semaphore (15)

Indicating that the container in the pods hasn’t started yet. Otherwise, you get to see the following output indicating that the containers in the pod have successfully started:

Kubernetes Health Checks: A Guide to Probes - Semaphore (16)

At this point, the liveness probe can proceed to periodically check the liveness of the containers in the pod.

Conclusion

Probes are critical components in the Kubernetes ecosystem that ensure the health and availability of your applications. As we’ve explored, liveness, readiness, and startup probes play vital roles in monitoring your applications, each with a unique and significant part to play.

While the liveness probe monitors the ‘aliveness’ of your applications, ensuring that failing containers are restarted, the readiness probe checks if your application is ready to serve traffic, preventing it from receiving requests until it is fully prepared. On the other hand, startup probes provide an additional buffer for applications that require a longer startup time, allowing them time to initialize before other probes kick in.

You might be wondering why there’s a need for these probes when monitoring tools like Grafana are available. The key difference lies in their roles and approaches. While tools like Grafana provide excellent visualizations and an overview of system health, probes are Kubernetes’ first line of defense in managing application availability directly at the container level.

Monitoring tools help detect problems, but probes proactively prevent issues by controlling traffic flow to containers and managing container lifecycles based on their state. By integrating these different levels of monitoring – from the high-level Grafana dashboards to the control of probes – you can achieve a more robust, resilient, and reliable application environment within Kubernetes.

Kubernetes Health Checks: A Guide to Probes - Semaphore (2024)