Kubernetes is a powerful platform for managing containerized applications at scale.

To ensure the health and reliability of these applications, Kubernetes provides a set of mechanisms known as health probes. These probes allow Kubernetes to monitor the state of containers within a pod and take appropriate actions when something goes wrong.

In this article, we’ll explore the three main types of health probes in Kubernetes — liveness, readiness, and startup probes — and how they work to maintain the health and availability of your applications.

Probes Mechanism

Kubernetes provides three types of probes that can be used for both liveness and readiness checks:

HTTP GET Probe : This probe performs an HTTP GET request to a specified endpoint. If the response code is between 200 and 399, the probe is considered successful.

TCP Socket Probe : This probe tries to establish a TCP connection to the specified endpoint. If the connection is successful, the probe is considered successful.

Exec Probe : This probe runs a specified command inside the container. If the command exits with a status code of 0, the probe is considered successful.

Liveness Probe : Ensuring the Container is Alive

What is a Liveness Probe?

The liveness probe is a health check that determines whether a container is alive and functioning properly.

If the liveness probe fails, Kubernetes will consider the container to be unhealthy and will restart it according to the pod’s restart policy.

This helps to recover from situations where the container might be running into issues like deadlocks or memory leaks.

How it Works:

Kubernetes performs the liveness probe based on the configuration provided in the pod spec.
If the probe fails (e.g., an HTTP 500 response or a failed TCP connection), Kubernetes will kill the container and start a new instance of it.
Importantly, only the failing container is restarted, not the entire pod. This ensures that other containers in the pod continue to function without interruption.

Example Configuration :

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

Readiness Probe : Controlling Traffic Routing

What is a Readiness Probe?

The readiness probe checks whether a container is ready to handle traffic.

If the readiness probe fails, Kubernetes will remove the pod from the service endpoints, meaning it will not receive any traffic until it passes the readiness check again.

This is crucial during startup, configuration changes, or temporary issues that might render a container unable to serve requests.

How it Works:

When the readiness probe fails, Kubernetes does not restart the container. Instead, it marks the Pod as not ready and stops sending traffic to it.
This mechanism ensures that only healthy and fully initialized pods (and by extension, all the containers within those pods) serve traffic, improving the reliability of your services
The other containers in the pod continue running, but the pod itself is removed from the service endpoints until all readiness probes pass.

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Startup Probe : Handling Long Initialization Times

What is a Startup Probe?

The startup probe is used to determine whether a container’s application has started successfully.

This probe is particularly useful for applications that take a long time to initialize, such as those with extensive setup processes or dependencies.

How it Works:

The startup probe runs first, delaying the execution of liveness and readiness probes until it succeeds. This ensures that applications with long startup times are not prematurely killed or marked as not ready.
If the startup probe fails, Kubernetes will restart the container, just as with a liveness probe failure.
Once the startup probe succeeds, Kubernetes switches to using the liveness and readiness probes for ongoing health checks.

Example Configuration:

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

Advanced Health Probe Configurations

When configuring health probes in Kubernetes (whether liveness, readiness, or startup probes), there are three important timing parameters you can set: initialDelaySeconds, periodSeconds, and timeoutSeconds.

These parameters control when and how often the probes are executed, as well as how long Kubernetes waits for a probe to succeed before considering it failed.

1. `initialDelaySeconds`

Purpose: This parameter specifies the amount of time Kubernetes should wait after the container starts before performing the first probe.
Use Case: This is particularly useful if your application needs some time to initialize before it can respond to health checks.
Setting an appropriate initialDelaySeconds helps avoid premature probe failures during startup.

Example :

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

In this example, Kubernetes will wait 10 seconds after the container starts before performing the first liveness probe.

2. `periodSeconds`

Purpose: This parameter defines how often (in seconds) Kubernetes will perform the probe after the initial delay. It determines the interval between consecutive probes.
Use Case: You can configure this to control the frequency of health checks based on your application’s characteristics.
For instance, setting a shorter periodSeconds might be useful for critical services where you want frequent health checks.

Example :

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

In this example, Kubernetes will perform a readiness probe every 10 seconds after the initial delay.

3. `timeoutSeconds`

Purpose: This parameter specifies the amount of time Kubernetes will wait for the probe to succeed. If the probe does not complete within this time, it is considered a failure.
Use Case: timeoutSeconds is useful when you want to ensure that a probe doesn’t hang indefinitely.
By setting this parameter, you can control how long Kubernetes waits before determining that the container is unhealthy.

Example :

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 2

In this example, Kubernetes will consider the liveness probe a failure if it takes longer than 2 seconds to respond.

By carefully tuning initialDelaySeconds, periodSeconds, and timeoutSeconds, you can ensure that Kubernetes accurately monitors the health of your containers, avoiding unnecessary restarts while still detecting genuine issues promptly.

initialDelaySeconds and Startup Probes

There might be a confusion that if we have initial delay in seconds settings, why do we still need a startup probe ? It seems both are solving the same problem, buying some time before a container starts.

However, The initialDelaySeconds setting in liveness and readiness probes and the startup probe serve different purposes and address different scenarios in Kubernetes container management.

Initial Delay in Probes: Provides a simple delay before starting liveness and readiness checks. Useful for applications with predictable and short initialization times.
Startup Probe: Offers a dedicated and flexible mechanism for applications with long or variable startup times, ensuring they are given enough time to initialize without interference from liveness or readiness checks.

Interaction Between Probes and Pod Restart Policy

Kubernetes handles each container within a pod independently when it comes to health probes. The pod’s restart policy, combined with the probe configurations, dictates how Kubernetes manages container failures:

Liveness Probe Failure: The failing container is restarted according to the pod’s restart policy (Always, OnFailure, Never). The other containers in the pod continue running unaffected.
Readiness Probe Failure: The failing container is marked as not ready, and the pod is removed from the service endpoints. No container is restarted; the pod resumes receiving traffic once all readiness probes pass.
Startup Probe Failure: The failing container is restarted, similar to a liveness probe failure. The other containers continue running without interruption.

Multi Containers Pod Health Probe Settings

In a multi-container pod, each container’s health is monitored independently:

If one container fails a liveness or startup probe, only that container is restarted. Other containers continue to run.
If one container fails a readiness probe, the pod is removed from service endpoints until the container is ready again. The pod remains running, but it does not receive traffic.

Practical Example: Multi-Container Pod with Probes

apiVersion: v1
kind: Pod
metadata:
  name: multi-container-pod
spec:
  containers:
  - name: app-container
    image: myapp:latest
    ports:
    - containerPort: 8080
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
    startupProbe:
      httpGet:
        path: /startup
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
  - name: sidecar-container
    image: sidecar:latest
    ports:
    - containerPort: 9090
    livenessProbe:
      httpGet:
        path: /healthz
        port: 9090
      initialDelaySeconds: 10
      periodSeconds: 5
    readinessProbe:
      httpGet:
        path: /ready
        port: 9090
      initialDelaySeconds: 5
      periodSeconds: 5
    startupProbe:
      httpGet:
        path: /startup
        port: 9090
      initialDelaySeconds: 10
      periodSeconds: 5

Best Practices for Kubernetes Health Probe Configuration

Align Probes with Application Behavior :

Initial Delay (initialDelaySeconds): Set this to allow your application enough time to start up before the first probe is performed. Avoid setting it too short, which might result in false failures during startup.
Period (periodSeconds): Adjust the frequency of the probes to balance the need for timely health checks with the overhead of performing them. Critical applications might require more frequent checks, while less critical ones can have longer intervals.
Timeout (timeoutSeconds): Ensure this is long enough for the probe to complete under normal conditions, but short enough to detect genuine failures quickly.

2. Probes Should Not Be Too Heavy:

Keep your probes lightweight to avoid adding unnecessary load on the application. For example, avoid performing complex operations or querying large amounts of data as part of the probe.
Simple HTTP GET requests or basic TCP socket checks are often sufficient and minimize the impact on the application’s performance.

3. Use Appropriate Probes for Different Scenarios:

Liveness Probe: Use to detect and recover from application failures (e.g., deadlocks). Configure it to restart the container if the application becomes unhealthy.
Readiness Probe: Use to control traffic routing, ensuring that only healthy and ready containers receive traffic. This is crucial during application startup, configuration changes, or temporary downtime.
Startup Probe: Use for applications with long or variable startup times to ensure that they are not prematurely killed or marked as not ready.

4. Test Probes in Different Environments:

Validate your probe configurations in development, staging, and production environments to ensure they work correctly under varying loads and conditions.
Ensure that the probes accurately reflect the application’s health and readiness in all environments.

5. Monitor and Adjust Probes Regularly:

Regularly monitor the performance and results of your probes using Kubernetes dashboards, logs, and metrics.
Adjust the probe configurations based on observed behavior, especially after making changes to the application or infrastructure.

6. Avoid Setting Probes Too Aggressively:

Avoid setting periodSeconds too low, which can lead to excessive probe traffic and false positives. This can put unnecessary load on your application and Kubernetes.
Avoid setting timeoutSeconds too short, which may cause the probe to fail before the application has a chance to respond, especially under heavy load.

7. Graceful Shutdown Handling:

Ensure that your application can handle SIGTERM signals and shut down gracefully when a container is terminated. This prevents issues where the application might fail liveness or readiness checks unnecessarily during shutdown.

8. Use Different Probes for Different Containers:

In multi-container pods, configure health probes independently for each container. This ensures that the health of one container does not incorrectly impact the others.

9. Start with Conservative Defaults:

If unsure about specific values, start with conservative defaults (e.g., longer initialDelaySeconds and periodSeconds) and gradually tune them based on real-world behavior and requirements.

10. Document Probe Configuration:

Clearly document the reasoning behind your probe settings within your infrastructure-as-code (e.g., YAML files) or operational guides. This helps in maintaining consistency and understanding the purpose of each configuration during troubleshooting.

Example of a Well-Configured Probe :

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 2
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 15
  timeoutSeconds: 2
  successThreshold: 1

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 6

Key Points Related to Kubernetes Health Probes

Health Probes are Configured at the Container Level:

In Kubernetes, health probes are defined for individual containers, not for the pod as a whole. Each container in a pod can have its own set of probes, which allows for granular monitoring of the container’s health and readiness.

Configuring Health Probes is Optional:

Setting up health probes is not mandatory in Kubernetes. Whether or not to configure them depends on the specific needs of the application.
For simpler applications, probes may not be necessary, whereas more complex or critical applications can benefit significantly from them.

Not All Types of Probes are Required:

It is not necessary to configure all three types of health probes (Startup, Liveness, Readiness) for every container.
The decision to use specific probes should be based on the application’s requirements.
For instance, a Startup Probe might be essential for an application with a long initialization time, but not needed for a lightweight service that starts quickly.

Not Every Container Requires Probes:

It’s also not mandatory to define health probes for every container in a pod. Depending on the role and criticality of each container, some may require probes while others do not.
For example, a sidecar container that performs logging might not need the same level of health monitoring as the main application container.

Startup Probe vs. Liveness and Readiness Probes:

The Startup Probe is designed to run only once, during the container’s startup phase. It ensures the application has fully initialized before the container is subjected to other checks.
In contrast, Liveness and Readiness Probes continue to run throughout the container’s lifecycle. Liveness probes monitor the ongoing health of the container, while readiness probes determine if the container is ready to serve traffic.

Health Probes Can Prevent Unnecessary Restarts:

Properly configured health probes can help avoid unnecessary restarts.
For instance, a well-tuned readiness probe can prevent a container from being restarted when it’s temporarily unable to serve traffic but is otherwise healthy.
Similarly, a startup probe can prevent premature liveness probe failures by allowing the container sufficient time to initialize.

Impact on Service Availability:

Readiness probes play a crucial role in maintaining service availability. By marking containers as “not ready,” they ensure that only fully operational containers receive traffic.
This is particularly important in high-availability scenarios where traffic must only be routed to containers that can handle requests effectively.

Conclusion

Kubernetes health probes — liveness, readiness, and startup — are essential tools for maintaining the health and availability of your applications.

By correctly configuring these probes, you can ensure that Kubernetes effectively monitors your containers, restarts them when necessary, and routes traffic only to those that are ready to serve requests.

Understanding and leveraging these probes will help you build more resilient and reliable containerized applications in Kubernetes.

Exploring Kubernetes Health Probes

Probes Mechanism

Liveness Probe : Ensuring the Container is Alive

Readiness Probe : Controlling Traffic Routing

Startup Probe : Handling Long Initialization Times

Advanced Health Probe Configurations

1. `initialDelaySeconds`

2. `periodSeconds`

3. `timeoutSeconds`

initialDelaySeconds and Startup Probes

Interaction Between Probes and Pod Restart Policy

Multi Containers Pod Health Probe Settings

Best Practices for Kubernetes Health Probe Configuration

Key Points Related to Kubernetes Health Probes

Conclusion

One thought on “Exploring Kubernetes Health Probes”

Leave a comment Cancel reply

Probes Mechanism

Liveness Probe : Ensuring the Container is Alive

Readiness Probe : Controlling Traffic Routing

Startup Probe : Handling Long Initialization Times

Advanced Health Probe Configurations

1. initialDelaySeconds

2. periodSeconds

3. timeoutSeconds

initialDelaySeconds and Startup Probes

Interaction Between Probes and Pod Restart Policy

Multi Containers Pod Health Probe Settings

Best Practices for Kubernetes Health Probe Configuration

Key Points Related to Kubernetes Health Probes

Conclusion

Rate this:

Share this:

Related

One thought on “Exploring Kubernetes Health Probes”

Leave a comment Cancel reply

1. `initialDelaySeconds`

2. `periodSeconds`

3. `timeoutSeconds`