Rate Limiting Algorithms
Token Bucket
Bucket holds up to N tokens, refills at R tokens/sec. Each request costs 1 token. Allows bursts up to bucket size. Used by AWS API Gateway, Stripe. Most popular algorithm.
Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate. Smooths out bursts — output is always constant rate. Good for traffic shaping. Used in networking (QoS).
Fixed Window Counter
Count requests in fixed time windows (e.g., per minute). Reset count at window boundary. Simple but allows 2x burst at window edges (boundary problem).
Sliding Window Log
Store timestamp of each request. Count requests in trailing window. Most accurate but memory-intensive (stores every timestamp). O(n) per check.
Sliding Window Counter
Hybrid: weighted sum of current + previous window counts. prev_count × overlap% + current_count. Good balance of accuracy and memory. Used by Cloudflare.
Token Bucket vs Leaky Bucket
Token Bucket: allows bursts, variable output rate. Leaky Bucket: smooths bursts, constant output rate. Token Bucket better for APIs, Leaky Bucket better for traffic shaping.
Distributed Rate Limiting
Use Redis INCR + EXPIRE for centralized counting. Or use a local rate limiter + sync periodically. Trade-off: accuracy vs latency of checking shared state.
Rate Limit by What?
By IP (simple, NAT issues). By API key (most common). By user ID (after auth). By endpoint (protect expensive operations). Combine multiple dimensions for defense in depth.
Response Headers
X-RateLimit-Limit: 100, X-RateLimit-Remaining: 42, X-RateLimit-Reset: 1672531200, Retry-After: 30. Return 429 Too Many Requests when limit exceeded.
Graceful Degradation
Don't just reject — degrade. Serve cached responses, reduce quality, queue for later, return partial results. Hard rate limit for abuse, soft limit for good users.