Introduction to Service Mesh
Imagine you’re developing a microservices-based e-commerce application running on Azure Kubernetes Service (AKS). Your frontend, product catalog, payment service, and user authentication all need to communicate seamlessly. But soon, challenges emerge:
-
-
- Security: How do you ensure all internal traffic is encrypted?
- Observability: Why is the checkout service slow? Where are requests failing?
- Traffic Control: How can you roll out updates without downtime?
-
This is where Service Mesh comes in.
A Service Mesh is a dedicated infrastructure layer that handles service-to-service communication, providing critical capabilities that Kubernetes alone doesn’t offer.
Key benefits of using a Service Mesh include:
-
-
- Automatic encryption of all traffic between services (mTLS)
- Detailed observability into request flows and performance
- Advanced traffic management for canary deployments and A/B testing
- Improved resilience with automatic retries and circuit breaking
-
Why Kubernetes Alone Isn’t Enough
While Kubernetes provides basic service networking, it lacks critical capabilities for secure, observable, and resilient microservices communication:
-
- Insecure Communication
-
-
- Problem: Pod-to-Pod traffic is unencrypted by default (plaintext HTTP/gRPC).
-
-
-
- Risk: Vulnerable to MITM attacks and eavesdropping within the cluster.
-
2. Overly Permissive Networking
-
-
- Problem: Any Pod can communicate with any other Pod (unless NetworkPolicies are manually configured).
-
-
-
- Risk: Lateral movement attacks if a Pod is compromised.
- Risk: Lateral movement attacks if a Pod is compromised.
-
3. No Traffic Visibility
-
-
Problem: No built-in way to see:
-
-
-
-
- Which services are communicating.
- Request rates, latency, or error rates between Pods.
-
-
-
-
Impact: Blind spots in troubleshooting and auditing.
-
4. No Distributed Tracing
-
-
- Problem: Cannot trace a request’s journey across multiple microservices.
-
-
-
- Impact: Debugging latency issues or failures requires manual logging correlation.
-
5. Lack of Resilience Features
-
-
- Problem: No automatic retries, timeouts, or circuit breakers.
-
-
-
- Impact: Failures cascade (e.g., a single slow service can crash the entire system).
-
6. Limited Traffic Control
-
-
Problem:
-
-
-
- Cannot split traffic (A/B testing, canary deployments).
- Cannot throttle requests or apply rate limits.
- Cannot inject faults for testing.
-
How Istio Solves These Gaps
| Kubernetes Limitation | Istio Solution |
|---|---|
| Unencrypted traffic | Automatic mTLS for all Pod-to-Pod communication |
| No network policies | Authorization Policies (L7 rules for HTTP/gRPC) |
| No observability | Kiali (topology maps) + Prometheus/Jaeger (metrics/tracing) |
| No retries/timeouts | Retry policies, circuit breakers via VirtualService |
| Basic traffic routing | Advanced traffic splitting/mirroring |
Example: With Istio, you can:
-
-
- Encrypt all traffic without app changes.
- See real-time service dependencies in Kiali.
- Automatically retry failed requests (e.g., 3 retries on HTTP 503).
-
Understanding Istio Architecture
Istio consists of two main components:
1. Control Plane (Istiod – The Brain)
The Istio control plane, called Istiod, serves as the management center of the service mesh. It handles configuration management, security policies, and service discovery while distributing these settings to all components in the mesh.
The control plane also generates and manages certificates for mutual TLS authentication between services.
Istio Control Plane has following major components :
-
-
- Pilot → Configures Envoy proxies with routing rules.
- Citadel → Manages TLS certificates for mTLS.
- Galley → Validates and distributes configurations.
-
2. Data Plane (Envoy – The Muscle)
The data plane consists of Envoy proxy sidecars that get injected into each application pod. These proxies intercept and manage all network communication entering or leaving their pods. They enforce the traffic rules and policies received from the control plane while also collecting telemetry data.
-
-
- A sidecar proxy injected into each Pod.
- Handles encryption, load balancing, retries, and observability.
-
The control plane makes all the decisions about how traffic should flow, while the data plane proxies actually handle the network traffic according to those rules.
Istiod configures the Envoy proxies, which then manage all network communication for their respective application pods, creating a secure and observable service mesh.
Istio and Envoy: Open-Source Foundation
Istio is an open-source service mesh platform that builds on top of Envoy Proxy (also open-source). While Envoy provides the core proxying capabilities, Istio simplifies its management by:
-
-
- Translating high-level configurations (YAML/
istioctl) into Envoy-specific rules. - Abstracting complex Envoy configurations into user-friendly policies.
- Adding control-plane features (certificate management, telemetry aggregation).
- Translating high-level configurations (YAML/
-
This layered approach lets developers focus on intent (“secure all traffic”) rather than Envoy implementation details.
Istio’s Opt-In Model
A common misconception:
❌ “Installing Istio means all my cluster traffic is now managed by Istio.”
Reality:
✅ Istio only manages Pods in namespaces explicitly labeled for injection.
✅ Non-Istio namespaces continue working normally.
This opt-in approach allows gradual adoption without breaking existing apps.
Installing Istio on AKS
There are two ways you can deploy Istio on the AKS cluster :
-
-
- AKS Built-in Istio Integration
- Through Helm Chart
-
Option 1: AKS Built-in Istio Integration
Azure Kubernetes Service (AKS) now offers managed Istio as an add-on during cluster creation:
-
-
Enable via Azure Portal/CLI/ IaC :
az aks create --enable-istio -n mycluster -g mygroup
-
Key Notes:
- Auto-injects sidecars in selected namespaces.
- Includes pre-configured Prometheus/Grafana.
- Limited customization compared to manual install.
-
For production, evaluate whether the managed version meets your needs or if a manual Helm-based installation is preferable. If you are not sure , I would suggest you follow the Helm chart based approach, which is tested and trusted approach.
Option 2: Deploying Istio through Helm Chart
Here’s how to install Istio using Helm:
# Add Istio Helm repository helm repo add istio https://istio-release.storage.googleapis.com/charts helm repo update # Install Istio base components kubectl create namespace istio-system helm install istio-base istio/base -n istio-system helm install istiod istio/istiod -n istio-system --wait
Enabling Sidecar Injection
To enable Istio for specific namespaces:
kubectl create namespace app-namespace kubectl label namespace app-namespace istio-injection=enabled
Any new pods created in labeled namespaces will automatically have the Istio sidecar injected. The original deployment YAML doesn’t need any changes – here’s an example deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: product-service
namespace: app-namespace
spec:
replicas: 3
selector:
matchLabels:
app: product
template:
metadata:
labels:
app: product
spec:
containers:
- name: product
image: registry/product-service:v1
ports:
- containerPort: 8080
After deployment, verify the sidecar injection:
kubectl get pods -n app-namespace
You should see 2/2 containers ready (your application container plus the Istio proxy).
Behind the Scenes: How the Sidecar Works
-
-
-
The original Deployment YAML didn’t change—Istio dynamically injects the proxy.
-
kubectl describe podreveals theistio-proxycontainer.
-
-
Communication Patterns in Istio
Communication Between Istio-Enabled Pods
When both pods have Istio sidecars:
-
-
- All traffic is automatically encrypted with mTLS
- Traffic management policies are enforced (retries, timeouts)
- Full observability data is collected
-
Communication Between Istio and Non-Istio Pods
When only one pod has Istio:
-
-
- Communication falls back to plain HTTP by default
- Istio still monitors outbound traffic from its own pods
- Some features like mTLS won’t be available
-
To enforce mTLS across all communications:
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: istio-system
spec:
mtls:
mode: STRICT
This blocks non-mTLS traffic, ensuring security across the mesh.
Managing Istio: YAML vs. ISTIOCTL
Istio can be configured via :
1. Declarative YAML
-
-
Apply standard Kubernetes manifests for resources like:
apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: myapp spec: hosts: ["myapp.example.com"] http: - route: - destination: host: myapp port: number: 80
-
2. Imperative istioctl
The Istio CLI tool provides troubleshooting and validation.
When installing Istio (via Helm or AKS built-in), the istioctl CLI tool is not automatically installed. This is because this tool needs to be installed in your base machine (where you run kubectl etc) and not in AKS Cluster.
Download it separately from Istio’s release page and install it by following the instructions in the istio official document.
| Key Commands | Purpose |
|---|---|
istioctl install |
Install/upgrade Istio |
istioctl analyze |
Validate configurations |
istioctl proxy-status |
Check sync status of sidecars |
istioctl verify-install |
Verify installation health |
istioctl dashboard kiali |
Open Kiali UI |
Example: Verify sidecar synchronization:
istioctl proxy-status # SHOULD show "SYNCED" for all pods
Observability with Kiali and Jaeger
Istio provides powerful observability tools that require Prometheus as a prerequisite:
Kiali – Service Mesh Visualization
Kiali is Istio’s built-in observability dashboard that provides visualizations of your service mesh. It displays real-time service graphs showing how your microservices connect and communicate with each other.
Kiali helps you monitor traffic flows, identify errors, and verify your Istio configurations. It works by collecting metrics from Prometheus and presenting them in an interactive web interface where you can see service dependencies, health status, and traffic distribution.
Install Kiali in AKS Cluster:
helm install kiali-server istio/kiali-server -n istio-system
Key features:
-
- Interactive service topology maps showing dependencies
- Health monitoring with error rate visualization
- Validation of Istio configurations
Example use case: Identifying a misconfigured traffic split that’s sending too much traffic to a canary version.
Jaeger – Distributed Tracing
Jaeger is Istio’s distributed tracing system that helps track requests as they travel through multiple services. It records the complete path of individual requests across service boundaries, showing how long each operation takes. Jaeger is particularly useful for troubleshooting latency issues and understanding complex service interactions.
It visualizes the entire lifecycle of requests, making it easier to pinpoint performance bottlenecks or failures in microservice architectures.
Install Jaeger in AKS Cluster:
helm install jaeger istio/jaeger -n istio-system
Key features:
-
-
- End-to-end tracing of requests across services
- Latency breakdown at each hop
- Comparison of successful vs failed requests
-
Example use case: Diagnosing why checkout requests occasionally take 10 seconds by tracing through all service calls.
Both Kiali and jagger integrate with Prometheus for metric collection and work together to give comprehensive visibility into your service mesh operations.
Istio Ingress Gateway
Istio’s Ingress Gateway provides more advanced capabilities than traditional Kubernetes Ingress.
Traditional Ingress controllers lack:
-
-
-
- mTLS termination (they only do basic TLS).
- Advanced traffic routing (header-based, canary releases).
- Built-in rate limiting.
-
-
Istio Ingress Gateway Features
✔ mTLS Termination → Secure external traffic.
✔ Canary Deployments → Route 10% of traffic to a new version.
✔ Rate Limiting → Prevent API abuse.
| Feature | Istio Gateway | Traditional Ingress |
|---|---|---|
| Protocol Support | HTTP/1.1, HTTP/2, gRPC, WebSockets | Primarily HTTP/1.1 |
| Security | mTLS termination, JWT validation | Basic TLS termination |
| Traffic Management | Canary, mirroring, fault injection | Basic path-based routing |
| Observability | Integrated with full mesh telemetry | Limited metrics |
Gateway Scaling Considerations
The Istio Ingress Gateway acts as the entry point for external traffic into your service mesh. Since it handles all incoming requests, proper scaling is critical to avoid bottlenecks and ensure high availability.
We have a dedicated article for Istio Ingress Gateway, where we will discuss the scaling strategy in detail.
Performance Considerations
When implementing Istio, consider these performance implications:
-
-
- Latency: Each sidecar proxy adds approximately 50ms of latency per hop
- Resource Usage: Expect 10-15% increase in CPU and memory usage
- Pod Density: Sidecars reduce the number of pods per node you can run
-
Recommendations for production deployments:
-
- Increase node capacity by 20% to account for sidecar overhead
- Limit pod density to 30-50 pods per node depending on workload
- Disable unnecessary telemetry if resource usage is critical
- Conduct performance testing before and after Istio implementation
Conclusion
Istio provides a comprehensive solution for managing service-to-service communication in Kubernetes, with benefits including :
-
-
- Automatic encryption of all traffic with mTLS
- Detailed observability into service dependencies and performance
- Advanced traffic management for safe deployments
- Improved resilience through circuit breaking and retries
-
While Istio adds some overhead, the operational benefits typically outweigh the costs for most microservices architectures. Proper planning around scaling and resource allocation will ensure successful implementation.