In modern cloud-native applications, maintaining high availability during updates is crucial. Kubernetes provides a powerful mechanism called the rollout update strategy that allows you to update your application with minimal or zero downtime.
This article will explore how rollout strategies work, and how to manage them effectively in your Kubernetes deployments.
What is a Rollout Update Strategy?
The rollout update strategy (specifically the RollingUpdate strategy in Kubernetes) is a deployment approach that gradually replaces instances of your application with new ones.
Instead of stopping all pods simultaneously, Kubernetes:
-
-
- Starts new pods with the updated configuration
- Waits for them to become healthy
- Then terminates the old pods
- Repeats this process until all pods are updated
-
This approach is the default strategy for Kubernetes Deployments and ensures your application remains available during the update process.
Why Use Rollout Update Strategies?
Rollout updates provide several key benefits:
-
-
- Minimal downtime: Your application remains available to users during updates
- Reduced risk: If the new version fails, only a subset of pods is affected
- Gradual rollout: You can monitor the new version’s performance before fully deploying it
- Rollback capability: Easy to revert if something goes wrong
-
Configuring RollingUpdate Strategy in Deployment YAML
Here’s how to configure the rolling update strategy in a Deployment manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Number of extra pods allowed during update
maxUnavailable: 0 # Number of pods that can be unavailable during update
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:1.2.0
ports:
- containerPort: 8080
Key parameters:
-
maxSurge: Controls how many extra pods can be created during the update (can be absolute number or percentage)maxUnavailable: Controls how many pods can be unavailable during the update
Performing a Rollout Update
1. Updating the Deployment (Triggering Rolling Update)
The most common way to perform a rollout update is by updating your deployment:
# Update the container image (will trigger rolling update) kubectl set image deployment/my-app my-app=my-app:1.3.0 # Alternative: Edit the deployment directly kubectl edit deployment my-app
2. Forcing a Rollout Restart Without Configuration Changes
Sometimes you need to restart pods (e.g., for configMap or secret updates) without changing the deployment spec:
# Method 1: Using kubectl rollout restart
kubectl rollout restart deployment/my-app
# Method 2: Adding an annotation (useful in CI/CD pipelines)
kubectl patch deployment my-app -p '{"spec":{"template":{"metadata":{"annotations":{"kubectl.kubernetes.io/restartedAt":"'"$(date +%Y-%m-%dT%H:%M:%S%z)"'"}}}}}'
Monitoring Rollout Status
To check the status of your rollout update:
# View rollout status kubectl rollout status deployment/my-app # View rollout history kubectl rollout history deployment/my-app # View details of a specific revision kubectl rollout history deployment/my-app --revision=2
Handling Stuck Rollouts
Sometimes rollouts can get stuck due to various reasons (failed health checks, resource constraints, etc.). Here’s how to handle them :
1. Check Why It’s Stuck
# Describe the deployment to see events kubectl describe deployment my-app # Check pod status kubectl get pods -l app=my-app
2. Cancel a Stuck Rollout
# Undo the rollout (rollback to previous version) kubectl rollout undo deployment/my-app # If you need to force the rollout to continue (careful with this) kubectl rollout resume deployment/my-app
3. Advanced Troubleshooting
For more complex issues:
-
-
- Check resource quotas (
kubectl describe quota) - Verify pod resource requests/limits
- Examine readiness/liveness probe configurations
- Check for node issues (
kubectl get nodes)
- Check resource quotas (
-
Optimal Combinations of maxSurge and maxUnavailable for Kubernetes Rollouts
The ideal combination of maxSurge and maxUnavailable depends on your specific requirements for availability, rollout speed, and resource capacity. Below are detailed configurations for different scenarios.
Understanding the Interplay Between Parameters
These two parameters work together to control your rollout behavior:
| Parameter | Effect | Value Format |
|---|---|---|
| maxSurge | How many extra pods can be created during update | Absolute number (e.g., 2) or percentage (e.g., 25%) |
| maxUnavailable | How many pods can be unavailable during update | Absolute number (e.g., 1) or percentage (e.g., 10%) |
Key Insight: Higher maxSurge enables faster rollouts but requires more cluster resources. Lower maxUnavailable provides better availability but slows down deployments.
Recommended Strategy Combinations
1. Maximum Availability (Zero Downtime)
strategy:
rollingUpdate:
maxSurge: 100% # Double capacity during rollout
maxUnavailable: 0 # No pods ever unavailable
| Best For | Tradeoffs |
|---|---|
|
|
2. Balanced Approach (Recommended Default)
strategy:
rollingUpdate:
maxSurge: 25% # Some extra capacity
maxUnavailable: 25% # Some tolerance for pod churn
| Best For | Tradeoffs |
|---|---|
|
|
3. Fast Rollout (Speed Optimized)
strategy:
rollingUpdate:
maxSurge: 50% # Significant extra capacity
maxUnavailable: 50% # Large parallel rollout
| Best For | Tradeoffs |
|---|---|
|
|
4. Large Deployment Optimization (100+ pods)
strategy:
rollingUpdate:
maxSurge: 10% # Controlled extra capacity
maxUnavailable: 10% # Small batches
| Best For | Tradeoffs |
|---|---|
|
|
Special Case Combinations
Blue-Green Like Behavior
strategy:
rollingUpdate:
maxSurge: 100% # Full duplicate set
maxUnavailable: 0 # Maintain full capacity
Effect: Creates complete new set before deleting old (similar to blue-green but within a single deployment)
Rolling Restart Optimization
strategy:
rollingUpdate:
maxSurge: 1 # One extra pod at a time
maxUnavailable: 1 # One pod down at a time
Effect: Classic rolling update with minimal resource overhead
Decision Factors for Your Ideal Combination
Consider these when choosing values:
-
-
- Application Criticality: More critical = lower maxUnavailable
- Resource Headroom: Limited resources = lower maxSurge
- Pod Startup Time: Slow-starting apps = lower maxUnavailable
- Deployment Size: Larger deployments = smaller percentages
- Traffic Patterns: Spiky traffic = more conservative values
-
Practical Examples
E-Commerce Checkout Service
# High availability needed
strategy:
rollingUpdate:
maxSurge: 50%
maxUnavailable: 10%
Background Worker
# Can tolerate some downtime
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
Canary Deployment
# Very controlled rollout
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Recommendations
-
- Start Conservative: Begin with restrictive values and loosen as you gain confidence
- Monitor Rollout Metrics: Track deployment duration, resource utilization, and error rates
- Combine with Readiness Probes: Ensure accurate pod health checks
- Test Under Load: Validate during peak traffic simulations
Troubleshooting Poor Combinations
Symptom: Rollouts take too long
Fix: Increase maxSurge and/or maxUnavailable
Symptom: Resource exhaustion during rollout
Fix: Decrease maxSurge, add resource limits
Symptom: Brief service interruptions
Fix: Decrease maxUnavailable, improve readiness probes
The “ideal” combination ultimately depends on your specific requirements, but the balanced approach (25%/25%) is an excellent starting point for most production workloads. Adjust based on your actual rollout behavior and availability needs.
Rollout Strategy vs Pod Disruption Budget: Key Differences
1. Purpose and Scope :
-
-
- Rollout Strategy (Deployment):
- Controls voluntary pod replacements during deployments/updates
- Managed through
spec.strategyin Deployment resources - Examples: Rolling updates, recreations, blue-green deployments
- Rollout Strategy (Deployment):
-
-
-
- Pod Disruption Budget (PDB):
- Protects against involuntary disruptions
- Governs node drains, cluster autoscaling, admin actions
- Separate resource (
policy/v1) that works alongside Deployments
- Pod Disruption Budget (PDB):
-
2. When Each Applies :
| Scenario | Rollout Strategy | PDB |
|---|---|---|
kubectl rollout restart |
✅ Yes | ❌ No |
| Deployment image update | ✅ Yes | ❌ No |
kubectl drain |
❌ No | ✅ Yes |
| Cluster autoscaler downscale | ❌ No | ✅ Yes |
| Manual pod deletion | ❌ No | ✅ Yes |
| Node failure | ❌ No | ✅ Yes |
3. Key Technical Differences :
| Characteristic | Rollout Strategy | PDB |
|---|---|---|
| Configuration Level | Per-deployment | Cross-cutting (applies to all matching pods) |
| Disruption Type | Intentional updates | Unintended disruptions |
| Parameters | maxSurge, maxUnavailable |
minAvailable or maxUnavailable |
| Enforcement | Deployment controller | Disruption controller |
4. When to Use Each:
Use Rollout Strategy When:
-
-
-
-
-
- You’re actively deploying new versions
- Controlling the pace of pod replacements
- Managing resource usage during updates
-
-
-
-
Use PDB When:
-
-
-
-
- You need to guarantee minimum available instances
- Protecting against cluster maintenance operations
- Preventing autoscaler from removing too many pods
-
-
-
5. They Can Work Together:
Example scenario:
-
-
-
- PDB ensures at least 2 pods are always available (
minAvailable: 2) - Rollout strategy updates pods gradually (
maxUnavailable: 1) - Result:
- PDB ensures at least 2 pods are always available (
-
-
-
-
-
-
-
- During updates: Deployment respects both constraints
- During node drains: PDB prevents excessive pod termination
-
-
-
-
Best Practice:
For critical applications, use both :
# Deployment
spec:
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
# PDB
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-app
This ensures protection during both voluntary updates and involuntary disruptions.
One thought on “Kubernetes Rollout Update Strategy : Ensuring Zero Downtime Deployments”