Envoy Proxy
The L7 proxy that actually solved the 'every service does networking differently' problem
Use Cases
Architecture
Why It Exists
Anyone who has run microservices in production knows the pain. The Java services retry with exponential backoff. The Go services use a hand-rolled retry loop. The Python services do not retry at all. Everyone has different timeouts, different circuit breaker logic, different ways of collecting metrics. Now multiply that by 50 services. Good luck debugging a latency spike on a Friday afternoon.
Lyft hit exactly this wall in 2016 and built Envoy. The core idea is simple: pull all the networking concerns out of the application and into a proxy that sits alongside each service. Retries, load balancing, TLS, metrics, tracing. The proxy handles all of it. A Go service and a Java service now behave identically on the network because neither one is making those decisions anymore. Envoy is.
What set Envoy apart from Nginx and HAProxy was that it was built for dynamic configuration from day one. Those older proxies assume a config file is written and reloaded. Envoy assumes config is pushed to it via APIs, and it applies changes immediately. No reload, no restart, no dropped connections. That design choice is what made the whole service mesh wave possible.
How It Works
Proxy Architecture: Envoy is a multi-threaded, non-blocking, event-driven proxy written in C++. Connections flow through a pipeline. Listeners accept incoming connections. Filter Chains process requests through ordered filters (TLS, HTTP codec, routing, rate limiting). Clusters represent upstream service endpoints. Each worker thread runs its own event loop, which keeps lock contention low. This is not a request-per-thread model. It scales well on modern hardware.
xDS APIs: This is the part that takes the most time to learn, and it is also the part that makes Envoy genuinely different. The xDS protocol defines gRPC and REST APIs for each configuration type: LDS (Listener Discovery), RDS (Route Discovery), CDS (Cluster Discovery), EDS (Endpoint Discovery), and SDS (Secret/TLS Discovery). A control plane (Istio, a custom one, or something like go-control-plane) implements these APIs and pushes configuration to Envoy. When routes change, the control plane pushes new RDS config and Envoy applies it immediately. Zero downtime.
HTTP Processing: When an HTTP request hits Envoy, the HTTP connection manager decodes it (HTTP/1.1, HTTP/2, or HTTP/3), runs it through the HTTP filter chain (router, RBAC, rate limit, JWT auth, compression, etc.), matches it to a route, selects an upstream cluster, and forwards the request. Along the way, Envoy generates detailed metrics (request count, latency histograms, error rates per upstream), propagates tracing headers (B3, W3C TraceContext), and writes structured access logs. All of this comes without adding a single library to the application.
Architecture Deep Dive
Service Mesh Pattern: In a service mesh like Istio, Envoy runs as a sidecar container in every pod. Kubernetes network rules (iptables or eBPF) redirect all pod traffic through the local Envoy. Every request, incoming and outgoing, flows through the proxy. The control plane (Istio's istiod) watches Kubernetes services and pushes endpoint updates to all Envoys via EDS. Mutual TLS certificates get automatically provisioned and rotated via SDS. No certificate is ever touched manually.
Load Balancing: Envoy ships with multiple algorithms. Round Robin is the default and good enough for most cases. Least Request routes to the host with the fewest active requests, which is the right choice when backends have variable latency. Ring Hash provides consistent hashing for cache affinity. Maglev is Google's consistent hashing algorithm with better distribution than ring hash. Zone-aware routing prefers local zone backends, which cuts cross-zone latency and saves on egress costs. Pick the algorithm that matches the actual traffic pattern, not the one that sounds most impressive.
Resilience Features: Retries are configurable per route. Set the conditions (5xx, gateway-error, reset, connect-failure), budget limits, and backoff. Get these right or they will amplify outages. Circuit Breaking caps concurrent connections, pending requests, and retries per upstream, which prevents resource exhaustion. Outlier Detection passively watches upstream errors and ejects unhealthy hosts. It reacts faster than active health checks for transient failures, but the defaults are aggressive. Tune them.
Wasm Extensibility: Envoy supports WebAssembly (Wasm) filters for custom logic. Filters can be written in C++, Rust, Go, or AssemblyScript, compiled to Wasm, and loaded at runtime. This is the escape hatch for custom authentication, header manipulation, or protocol handling. It avoids forking Envoy or waiting months for upstream changes. The developer experience is still rough compared to writing native code, but it is getting better.
Google Cloud's Traffic Director uses Envoy as its data plane. Stripe runs all API traffic through Envoy for load balancing and observability. The project has over 25,000 GitHub stars and sits at the foundation of the service mesh ecosystem. It is not going anywhere.
Deployment Patterns
Sidecar proxy: one Envoy per service instance. Maximum isolation, but the highest resource overhead. This is the Istio default and the most common pattern.
Per-node proxy (ambient mesh): one Envoy per Kubernetes node, shared by all pods on that node. Lower overhead, less isolation. Istio's ambient mesh is pushing this model, and it is worth watching if sidecar memory costs are a concern.
Edge proxy: Envoy as the API gateway handling external traffic with TLS termination, authentication, and rate limiting. Many teams start here before adopting sidecars internally.
Most production deployments combine patterns. Edge Envoy for external traffic, sidecar Envoys (or ambient) for inter-service communication. Start with what solves the current problem, and expand when the need arises.
Pros
- • Understands L7 protocols (HTTP/2, gRPC, WebSocket, MongoDB, Redis), so it can make smart routing decisions
- • Dynamic configuration via xDS APIs. No restarts, no reloads. Config just shows up.
- • Ships with Prometheus metrics, distributed tracing, and structured access logs out of the box
- • Battle-tested at serious scale (Lyft, Google, Stripe, Airbnb)
- • CNCF graduated project with a stable API and an active community
Cons
- • Each sidecar eats 50-100MB of memory. That adds up fast.
- • The xDS API has a steep learning curve. Expect a few weeks before your team is comfortable.
- • Sidecar proxying adds 0.5-2ms of tail latency per hop
- • Debugging proxy issues means you need to understand L7 protocol internals
- • Filter chain ordering is easy to mess up, and misconfigurations cause subtle routing bugs
When to use
- • Microservices that need consistent load balancing and observability across languages
- • Service mesh deployments (Istio, Consul Connect, or your own custom setup)
- • You need canary deployments, traffic shifting, or fault injection
- • Zero-trust networking with mutual TLS between all services
When NOT to use
- • A monolith with no inter-service communication. Envoy has nothing to do here.
- • Environments where 50-100MB of memory overhead per pod is a dealbreaker
- • Teams without the bandwidth to learn and operate L7 proxy infrastructure
- • Pure L4 load balancing needs. Just use IPVS or a simpler L4 proxy.
Key Points
- •xDS (discovery service) APIs provide fully dynamic configuration. Envoy's listeners, routes, clusters, and endpoints all get pushed from a control plane. Unlike Nginx or HAProxy, Envoy never needs a reload or restart for config changes.
- •The sidecar pattern puts an Envoy proxy next to each service instance. All inbound and outbound traffic flows through the proxy, delivering uniform load balancing, mTLS, retries, and observability without changing a single line of application code.
- •Envoy's HTTP connection manager runs every L7 request through a filter chain. Filters handle routing, rate limiting, authentication, CORS, compression, and more. The chain is configurable and extensible with custom Wasm filters.
- •Outlier detection (circuit breaking) ejects unhealthy upstream hosts based on consecutive errors (5xx, timeouts, connection failures). It stops traffic to failing instances before health checks even notice, which prevents cascading failures.
- •Maglev consistent hashing provides connection-affinity load balancing that minimizes disruption during scaling events. Adding or removing one server only rehashes 1/N of connections. This matters a lot for caching layers and stateful services.
Common Mistakes
- ✗Leaving circuit breaker thresholds at defaults. The default outlier detection is aggressive: 5 consecutive 5xx errors eject a host for 30 seconds. During transient failures, that causes thundering herd on the remaining hosts. Tune these numbers for the service's actual error profile.
- ✗Forgetting Envoy's resource consumption in capacity planning. Each sidecar uses 50-100MB RAM and 0.1-0.5 CPU cores. For a cluster with 5,000 pods, that is 250-500GB of RAM just for Envoy sidecars.
- ✗Misconfiguring retry policies. Default retries on 5xx without idempotency checks amplify traffic during outages. Always set retry budgets and only retry on retriable status codes (502, 503, 504).
- ✗Skipping header-based routing for canary deployments. Envoy's route matching supports header, path, and query parameter matching for fine-grained traffic splitting. Weight-based splitting alone is harder to control.
- ✗Ignoring the admin interface during debugging. Envoy exposes /clusters, /config_dump, /stats, and /logging endpoints that show the complete runtime state. Most debugging starts with config_dump to verify what Envoy actually received from the control plane.