Istio
Architecture Diagram
Why It Exists
So the decision has been made: a service mesh is needed. (If that decision hasn't been made yet, read the Service Mesh page first.) Now comes the choice between Istio and everything else.
Istio is the most feature-complete service mesh available. It provides L7 traffic management, fine-grained authorization policies, automatic mTLS, distributed tracing integration, fault injection, traffic mirroring, canary deployments, and a Wasm plugin system for custom Envoy filters. Google built it, Salesforce runs it across their infrastructure, and Airbnb uses it to manage 1,500+ services.
The tradeoff is complexity. Istio has more CRDs, more configuration knobs, and more ways to misconfigure things than any other mesh. If Linkerd is the Honda Civic (reliable, efficient, gets the job done), Istio is the BMW M3. More power, more control surfaces, more opportunities to wrap around a tree without sufficient expertise.
That said, for advanced traffic management (canary with header-based routing, fault injection, traffic mirroring), multi-cluster mesh, or fine-grained L7 authorization policies, Istio is the only open-source mesh that covers all of those. The question isn't whether Istio is powerful. It's whether that power is needed and the team can operate it.
How It Works
Istiod: The Unified Control Plane
Before Istio 1.5, the control plane was three separate processes: Pilot (traffic management), Citadel (certificate authority), and Galley (configuration validation). Running three processes meant three things to monitor, three things to scale, three things that could fail independently. In 1.5, they merged everything into a single binary called Istiod.
Pilot compiles VirtualService and DestinationRule CRDs into Envoy-native xDS configuration and pushes it to every proxy over gRPC streaming. When a new VirtualService is applied, Pilot translates it into route rules that Envoy understands, then distributes the updated config to every affected sidecar. Since Istio 1.12, this uses delta xDS (incremental updates) instead of full pushes, which reduced push time by roughly 90% for large meshes.
Citadel acts as the mesh Certificate Authority. It issues SPIFFE SVIDs (x509 certificates) with 24-hour TTL to each sidecar. When Pod A calls Pod B, both sidecars present their SPIFFE identity (spiffe://cluster.local/ns/<namespace>/sa/<service-account>) and verify each other's certificate. No application code changes. No cert-manager. Citadel handles issuance, rotation, and revocation.
Galley (now internal to Istiod) validates Istio CRDs at admission time and distributes configuration internally. There's no direct interaction with it, but it's the reason malformed CRDs get rejected before they break the mesh. Assuming the validating webhook hasn't been disabled, which some teams do and then regret.
The xDS API
xDS is the protocol Envoy uses to receive dynamic configuration from a control plane. Four APIs matter:
- CDS (Cluster Discovery Service): defines upstream service clusters. When a new Kubernetes Service appears, Istiod pushes a CDS update so Envoy knows about it.
- EDS (Endpoint Discovery Service): maps clusters to actual pod IPs. When pods scale up or down, EDS updates the endpoint list.
- LDS (Listener Discovery Service): defines what ports Envoy listens on and what filter chains to apply (mTLS, authorization, rate limiting).
- RDS (Route Discovery Service): maps URL paths and headers to upstream clusters. This is where VirtualService routing rules end up.
To debug xDS issues, use istioctl proxy-config clusters/routes/listeners/endpoints <pod>. For deeper inspection, hit the Envoy admin API directly at localhost:15000 on any sidecar pod. These are the tools that save the on-call engineer at 3 AM.
The CRD Model
VirtualService is the most powerful and most misused CRD. It controls traffic routing: canary weights, header-based routing, URL rewrites, timeouts, retries, fault injection, and traffic mirroring. A single misconfigured VirtualService with a hosts: ["*"] match will redirect traffic mesh-wide.
DestinationRule controls what happens after routing: connection pool settings (max connections, max pending requests), outlier detection (ejection thresholds), TLS mode (ISTIO_MUTUAL, SIMPLE, DISABLE), and load balancing algorithm (ROUND_ROBIN, LEAST_REQUEST, RANDOM).
Gateway configures ingress and egress through Istio's Envoy-based gateway pods. This is not a Kubernetes Ingress resource. Istio Gateways bind to specific ports and hosts, then delegate routing to VirtualService rules. Run separate gateway deployments per team or environment. Sharing a single gateway across the cluster is asking for configuration conflicts.
AuthorizationPolicy is the zero-trust enforcement point. It controls L7 access: allow or deny by source principal, namespace, path, method, and headers. Misconfigure these and all traffic to a namespace gets blocked (see Failure Scenarios below).
PeerAuthentication controls mTLS mode per namespace or workload. STRICT requires mTLS. PERMISSIVE accepts both plaintext and mTLS (for migration). DISABLE turns off mTLS. Teams often leave PERMISSIVE on permanently, which defeats the purpose of the mesh.
Ambient Mode
Istio Ambient (GA in Istio 1.22) is the biggest architectural change since the Pilot/Citadel/Galley merge. Instead of injecting a sidecar into every pod, Ambient uses two components:
ztunnel runs as a DaemonSet on every node. It handles L4 concerns: mTLS encryption, connection-level authorization, and telemetry. Every pod on the node gets mTLS automatically through ztunnel, with no sidecar and no application changes. Memory cost: roughly 20MB per node instead of 100MB per pod.
Waypoint proxies are optional per-namespace Envoy instances that handle L7 features: HTTP routing, header-based authorization, retries, fault injection. Waypoints only need to be deployed for namespaces that actually need L7 traffic management. Most namespaces only need L4 (mTLS + basic metrics), so most namespaces don't need a waypoint.
The resource savings are real. For a 1,000-pod cluster, sidecar mode costs roughly 100GB of additional memory. Ambient mode with ztunnel costs roughly 2GB (20MB * 100 nodes). Even with waypoint proxies for 20% of namespaces, that's roughly 80% less overhead. The tradeoff: Ambient has less production mileage than sidecar mode and some L7 features require explicit waypoint deployment. I wouldn't bet a bank's production on it without thorough testing in staging first.
Multi-Cluster Mesh
Istio supports two multi-cluster topologies:
Primary-remote: one cluster runs Istiod (primary), other clusters connect as remotes. Simpler to operate. Single point of failure for the control plane. Good for hub-and-spoke architectures where one region is the control hub.
Multi-primary: each cluster runs its own Istiod instance. Clusters sync service discovery through east-west gateway pods using SNI-based routing. More resilient (no single control plane), but configuration sync between Istiod instances adds complexity. This is the right choice for active-active multi-region setups where each region needs independent survivability.
Cross-cluster traffic flows through east-west gateway pods. Service A in Cluster 1 calls Service B in Cluster 2 by routing through the local east-west gateway, which forwards over mTLS to the remote gateway, which delivers to Service B. The application sees a normal service call. The mesh handles the cross-cluster routing transparently.
Wasm Plugins
Wasm plugins extend Envoy's filter chain without rebuilding the proxy binary. Use cases include custom authentication, request/response header manipulation, and specialized logging. Write the plugin in Rust, Go, or C++, compile to Wasm, and deploy via the WasmPlugin CRD.
One warning: each Wasm plugin adds roughly 1-3ms of latency to every request passing through that sidecar. A plugin that makes an external HTTP call (like hitting an auth service) can add much more. Load-test plugins before deploying to production. I've seen a "simple" auth plugin add 15ms P99 because it made a synchronous call to an external JWT validation service on every request.
Production Considerations
- Revision-based canary upgrades. Run two Istiod revisions side by side (e.g.,
1-21and1-22). Migrate namespaces one at a time by changing theistio.io/revlabel. If something breaks, relabel back to the old revision. Never do in-place Istiod upgrades in production. - Namespace-by-namespace adoption. Label namespaces with
istio-injection=enabledincrementally. Start with low-risk internal services. Move to customer-facing services after building confidence and tooling. A full mesh rollout across 200+ services should take 3-6 months, not 3-6 days. - istioctl analyze in CI. Run
istioctl analyze --all-namespacesas a CI gate before applying any Istio CRD changes. This catches misconfigurations (orphaned VirtualServices, conflicting DestinationRules, invalid selectors) before they hit the cluster. - Selective access logging. Don't enable Envoy access logs mesh-wide. At 10K RPS per service across 200 services, that's 2M log lines per second. Enable access logging per namespace or per service for debugging, then turn it off.
- Gateway separation. Run separate ingress gateway deployments per team or per environment. A shared gateway means one team's misconfigured VirtualService can break another team's ingress. Isolation beats efficiency here.
- Sidecar CRD for config scoping. Use the Sidecar CRD to limit which services each proxy knows about. Without it, every sidecar receives the entire mesh configuration. At 200+ services, xDS push sizes become the bottleneck. The Sidecar CRD alone can reduce push sizes by 90%.
Failure Scenarios
Scenario 1: xDS Push Storm After Wildcard VirtualService. Someone applies a VirtualService with hosts: ["*"] in a mesh with 2,000 sidecars. Istiod tries to push the updated config to all 2,000 proxies simultaneously. CPU spikes to 100%, xDS push latency jumps from 2s to 60s. New pods sitting in the queue get stale config, and recently deployed services start routing to the wrong backends. Detection: pilot_xds_pushes rate spikes, pilot_proxy_push_time_seconds P99 exceeds 30s, pilot_xds_push_context_errors starts climbing. Recovery: delete or scope down the offending VirtualService. Istiod will push the corrected config within seconds once it recovers. Prevention: always use namespace-scoped VirtualServices. Deploy the Sidecar CRD to limit each namespace's config visibility. Add a CI check that rejects VirtualServices with hosts: ["*"] unless explicitly approved.
Scenario 2: Envoy Sidecar OOMKill Under High Connection Count. A service handles 50,000 concurrent WebSocket connections. Each connection holds state in the Envoy sidecar. Memory grows from the baseline 100MB to 2GB, hits the container memory limit, and gets OOMKilled. The pod restarts, dropping all 50,000 active connections. Clients reconnect simultaneously, causing a thundering herd that immediately pushes memory back up. The pod enters a crash loop. Detection: envoy_server_memory_allocated trending upward over hours, container_memory_working_set_bytes approaching the limit, container_oom_events_total incrementing. Recovery: increase sidecar memory limits for high-connection services using the sidecar.istio.io/proxyMemoryLimit pod annotation. Set terminationDrainDuration so Envoy drains connections gracefully before shutdown. Prevention: profile sidecar memory usage per service under realistic connection counts. Set per-service resource limits based on actual usage, not defaults. For services with 10K+ long-lived connections, consider excluding them from the mesh or using Ambient mode.
Scenario 3: AuthorizationPolicy Deny-All Blocks Entire Namespace. A team applies an AuthorizationPolicy with action DENY and no matching ALLOW policy in the same namespace. Every request returns 403 RBAC denied. All services in the namespace go dark within seconds. If this namespace contains a dependency for other namespaces, the blast radius cascades. Detection: istio_requests_total{response_code="403"} spikes across the namespace, envoy_http_rbac_denied counter rises on every pod. Recovery: delete the offending AuthorizationPolicy. Traffic resumes within the xDS push latency (typically 2-5 seconds). Prevention: always apply ALLOW policies before adding DENY policies. Test with istioctl experimental authz check <pod> before applying. Run all policy changes through a staging namespace first. Add a CI validation step that simulates policy evaluation against known traffic patterns.
Capacity Planning
Each Envoy sidecar consumes roughly 100MB RAM and 0.1 vCPU at baseline. Under load (1K RPS through the proxy), expect 150MB RAM and 0.5 vCPU. Istiod uses about 1GB RAM per 1,000 sidecars for xDS configuration distribution and certificate management. These numbers double without the Sidecar CRD to scope configuration.
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Sidecar memory per pod | < 150MB | > 250MB | > 400MB |
| Sidecar CPU per pod | < 0.3 vCPU | > 0.7 vCPU | > 1.0 vCPU |
| Istiod memory | < 2GB | > 4GB | > 8GB |
| Istiod CPU | < 2 vCPU | > 4 vCPU | > 8 vCPU |
| xDS push latency (P99) | < 5s | > 15s | > 60s |
| mTLS handshake failures | 0/min | > 10/min | > 100/min |
| Proxy convergence time | < 10s | > 30s | > 120s |
Config scope is the biggest lever. An unconstrained mesh (no Sidecar CRD) pushes the entire mesh configuration to every sidecar. At 200+ services, each proxy receives routing rules, cluster definitions, and endpoint lists for every service in the mesh, even ones it never calls. The Sidecar CRD tells Istiod to only push the config each proxy actually needs. This typically reduces push size by 90% and xDS push latency by a similar margin.
Real-world references: Airbnb runs Istio across 1,500+ services with a dedicated platform team of 4 engineers. AutoTrader UK migrated 400 services to Istio over 18 months, adopting namespace by namespace. Salesforce uses Istio for multi-cluster service mesh across multiple cloud regions. For planning purposes: mesh_memory_overhead = pod_count * 100MB, istiod_memory = (pod_count / 1000) * 1GB. For a 1,000-pod cluster, that's 100GB of additional memory (roughly $400/mo on EC2) plus 1GB for Istiod.
Architecture Decision Record
ADR: Istio Deployment Mode Selection
Context: Istio has been chosen. Now the decision is which deployment mode fits the team's scale, budget, and operational maturity.
| Criteria (Weight) | Sidecar Mode | Ambient Mode | External Control Plane |
|---|---|---|---|
| Resource overhead (25%) | ~100MB per pod | ~20MB per node (ztunnel) | Same as sidecar + mgmt cluster |
| Feature completeness (20%) | Full L7 everywhere | L4 by default, L7 via waypoint | Full (depends on mode) |
| Operational complexity (20%) | Medium (sidecar injection) | Lower (no injection) | High (separate mgmt cluster) |
| Production maturity (20%) | Battle-tested (since 2018) | GA in 1.22 (newer) | Mature but less common |
| Multi-cluster readiness (15%) | Primary-remote or multi-primary | Supported | Designed for multi-cluster |
Decision framework:
- Under 500 pods, need full L7 features today, team has mesh experience. Go with sidecar mode. It's the most documented, most tested, and most understood deployment model. Every blog post, every Stack Overflow answer, every vendor integration assumes sidecar mode.
- 500-5,000 pods, cost-sensitive, L4 mTLS sufficient for most services. Deploy Ambient mode with ztunnel for L4 everywhere, then add waypoint proxies only for namespaces that need L7 routing or authorization. This delivers mTLS at a fraction of the resource cost.
- Multi-cluster with centralized operations. Run Istiod on a dedicated management cluster that doesn't serve application traffic. Remote clusters connect to this external control plane. Failure of the management cluster doesn't affect running sidecars (stale xDS cache keeps working), but it does block new deployments.
- Migrating from sidecar to Ambient. Run both modes simultaneously. Ambient-enabled namespaces use ztunnel. Sidecar-enabled namespaces keep their proxies. Migrate namespace by namespace, validating metrics and error rates at each step. Plan for 2-3 months.
Key Points
- •Istiod is the single-binary control plane that merges Pilot, Citadel, and Galley. It compiles routing rules into Envoy xDS configuration and pushes it to every sidecar over gRPC streaming.
- •VirtualService and DestinationRule are the two most-used CRDs. VirtualService controls where traffic goes. DestinationRule controls what happens when it gets there.
- •mTLS is automatic through SPIFFE identity. Citadel issues short-lived certificates (24h TTL) with no application code changes required.
- •Ambient mode replaces per-pod sidecars with per-node ztunnel (L4) and optional waypoint proxies (L7), cutting resource overhead by 60-80%.
- •At 1,000 sidecars, budget roughly 100GB of additional cluster memory and 1GB for Istiod. Know these numbers before committing.
Tool Comparison
| Tool | Type | Best For | Scale |
|---|---|---|---|
| Istio (Sidecar Mode) | Open Source | Full L7 traffic management, policy enforcement, multi-cluster | Large-Enterprise |
| Istio (Ambient Mode) | Open Source | Lower resource overhead, no sidecar injection, L4 by default | Medium-Enterprise |
| Envoy Gateway | Open Source | Kubernetes Gateway API ingress without full mesh overhead | Medium-Large |
| Gloo Mesh | Commercial | Multi-cluster Istio management with enterprise support | Large-Enterprise |
Common Mistakes
- Deploying Istio mesh-wide on day one instead of adopting namespace by namespace. This turns every misconfiguration into a cluster-wide incident.
- Writing VirtualService rules without understanding Envoy route matching precedence. Longest prefix match is not the same as first match. Read the Envoy docs, not just the Istio docs.
- Ignoring Istiod resource limits. A single Istiod instance managing 3,000+ sidecars without tuned memory limits will OOMKill during large config pushes.
- Not running istioctl analyze in CI. Invalid CRDs silently break routing, and the problem goes unnoticed until traffic stops flowing.
- Enabling Wasm plugins in production without load-testing them first. A slow plugin adds latency to every single request through that sidecar.
- Leaving PeerAuthentication in PERMISSIVE mode permanently. It's meant for migration, not as a final state. The result is half-mTLS with a false sense of security.
- Skipping revision-based canary upgrades for Istiod itself. In-place upgrades risk dropping xDS connections to all sidecars simultaneously.