Rate Limiting Algorithms

Token Bucket

Bucket holds up to N tokens, refills at R tokens/sec. Each request costs 1 token. Allows bursts up to bucket size. Used by AWS API Gateway, Stripe. Most popular algorithm.

Leaky Bucket

Requests enter a queue (bucket) and are processed at a fixed rate. Smooths out bursts — output is always constant rate. Good for traffic shaping. Used in networking (QoS).

Fixed Window Counter

Count requests in fixed time windows (e.g., per minute). Reset count at window boundary. Simple but allows 2x burst at window edges (boundary problem).

Sliding Window Log

Store timestamp of each request. Count requests in trailing window. Most accurate but memory-intensive (stores every timestamp). O(n) per check.

Sliding Window Counter

Hybrid: weighted sum of current + previous window counts. prev_count × overlap% + current_count. Good balance of accuracy and memory. Used by Cloudflare.

Token Bucket vs Leaky Bucket

Token Bucket: allows bursts, variable output rate. Leaky Bucket: smooths bursts, constant output rate. Token Bucket better for APIs, Leaky Bucket better for traffic shaping.

Distributed Rate Limiting

Use Redis INCR + EXPIRE for centralized counting. Or use a local rate limiter + sync periodically. Trade-off: accuracy vs latency of checking shared state.

Rate Limit by What?

By IP (simple, NAT issues). By API key (most common). By user ID (after auth). By endpoint (protect expensive operations). Combine multiple dimensions for defense in depth.

Response Headers

X-RateLimit-Limit: 100, X-RateLimit-Remaining: 42, X-RateLimit-Reset: 1672531200, Retry-After: 30. Return 429 Too Many Requests when limit exceeded.

Graceful Degradation

Don't just reject — degrade. Serve cached responses, reduce quality, queue for later, return partial results. Hard rate limit for abuse, soft limit for good users.

Architecture & ScalingSheet 4 of 5

Infrastructure10 concepts