Load Balancing Algorithms
Round Robin
Requests distributed sequentially across servers. Simple, no state needed. Works well when servers are identical and requests are uniform.
Weighted Round Robin
Each server gets a weight proportional to its capacity. A server with weight 3 gets 3x the traffic of weight 1. Good for heterogeneous hardware.
Least Connections
Routes to the server with fewest active connections. Best for long-lived connections (WebSocket, DB pools) or variable request durations.
Weighted Least Connections
Combines least connections with server weights. Picks the server with lowest (active_connections / weight) ratio.
IP Hash
Hashes client IP to pick a server. Same client always hits same server — useful for session affinity without sticky cookies.
Consistent Hashing
Maps servers and requests onto a hash ring. Adding/removing a server only remaps ~1/N of keys. Used in CDNs, distributed caches (Memcached).
Least Response Time
Routes to the server with the lowest average response time + fewest connections. Adaptive but requires health monitoring overhead.
Random
Picks a random server. Surprisingly effective at scale due to law of large numbers. Zero state, zero coordination.
L4 vs L7 Load Balancing
L4 (transport) routes by IP/port — fast, no payload inspection. L7 (application) can route by URL, headers, cookies — more flexible, more CPU.
When to Use What
Stateless APIs → Round Robin. Mixed hardware → Weighted RR. Long connections → Least Connections. Session affinity → IP Hash. Caches → Consistent Hashing.