Rate Limiter
Token bucket for smooth throughput, sliding window for strict counting. Strategy pattern lets you swap algorithms without rewiring the system.
Key Abstractions
Facade that checks if a request is allowed and delegates to the active strategy
Interface for swappable algorithms. Each strategy decides allow or reject independently.
Refills tokens at a fixed rate, each request costs one token. Allows controlled bursts.
Tracks timestamps in a rolling window. Strict count-based limiting with no burst allowance.
Immutable config holding capacity, refill rate, and window size parameters
Class Diagram
How It Works
Rate limiting is about saying "no" gracefully. You have a finite resource (API endpoints, database connections, message queues) and you need to prevent any single client from consuming more than their fair share. The question is how you count and enforce those limits.
Token bucket is the most popular approach in production systems. Picture a bucket that holds, say, 10 tokens. Each request costs one token. A background process (or lazy calculation) adds tokens back at a steady rate. If the bucket is empty, the request gets rejected. This naturally allows short bursts (a full bucket lets a client fire 10 requests at once) while enforcing a long-term average rate.
Sliding window is the stricter alternative. You keep a log of timestamps for each request. When a new request comes in, you count how many timestamps fall within the last N seconds. If you're at the limit, reject. No bursts allowed. The downside is memory: you're storing a timestamp per request instead of a single counter.
Requirements
Functional
allow(clientId)returns true if the request should proceed, false if it should be throttled- Support per-client rate tracking (different clients have independent limits)
- Support at least two algorithms: token bucket and sliding window
- Algorithms should be swappable at configuration time
Non-Functional
- O(1) for token bucket decisions, O(n) worst case for sliding window pruning (amortized O(1))
- Thread-safe for concurrent request processing
- No background threads for token refill. Lazy computation is sufficient and simpler.
Design Decisions
Why token bucket as the primary algorithm?
Token bucket gives you two useful properties in one package. First, it enforces a long-term average rate (the refill rate). Second, it permits controlled bursts (up to bucket capacity). Most real APIs want this behavior. A mobile app might batch several API calls on launch, and that's fine as long as the average stays within bounds. Sliding window would reject those burst requests even though the overall rate is acceptable.
Why not fixed window counting?
Fixed window has the boundary burst problem. Imagine a limit of 100 requests per minute, window starting at :00. A client sends 100 requests at :59, then 100 more at :01. That's 200 requests in 2 seconds, double the intended rate. Sliding window solves this by always looking at the last 60 seconds from "right now." Token bucket solves it differently by making burst capacity explicit. Fixed window is simpler to implement but the edge case is a real production problem.
Why lazy refill instead of a timer thread?
A timer thread seems natural. Every second, iterate all buckets and add tokens. But if you have 100,000 active clients, you're updating 100,000 counters every tick even when most of them aren't making requests. Lazy refill computes the tokens earned since the last call at request time. You do a tiny bit of math per request, but you never waste work on idle clients. Less code, less CPU, fewer concurrency headaches.
Why Strategy + Factory over a simple if/else?
Right now we have two algorithms. But production systems often add more: leaky bucket, sliding window counter (approximate), adaptive rate limiting. Each new algorithm shouldn't require touching the RateLimiter class. Strategy isolates each algorithm. Factory encapsulates the mapping from config to strategy. Adding a new algorithm means writing a new class and one new case in the factory. Nothing else changes.
Interview Follow-ups
- "How would you handle distributed rate limiting?" Move the token state to a shared store like Redis. Use Lua scripting for atomic read-modify-write of the token count. Redis MULTI/EXEC also works but Lua is single-threaded in Redis, eliminating race conditions.
- "What about rate limiting by IP vs. by user vs. by API key?" The client_id parameter is already abstract. You extract the right identifier in middleware and pass it to allow(). You can even compose limiters (global + per-user) by chaining multiple checks.
- "How would you communicate limits to clients?" HTTP headers.
X-RateLimit-Limitfor the max,X-RateLimit-Remainingfor current tokens,X-RateLimit-Resetfor when the window resets or tokens refill. Return 429 Too Many Requests when rejected. - "What if you need different limits for different endpoints?" Create a limiter per endpoint (or per endpoint group) with its own config. A map from endpoint pattern to RateLimiter instance handles the routing.
Code Implementation
1 from abc import ABC, abstractmethod
2 from collections import deque
3 from dataclasses import dataclass
4 from enum import Enum
5 from threading import Lock
6 import time
7
8
9 class AlgorithmType(Enum):
10 TOKEN_BUCKET = "token_bucket"
11 SLIDING_WINDOW = "sliding_window"
12
13
14 @dataclass(frozen=True)
15 class RateLimitConfig:
16 """Immutable configuration for rate limiting."""
17 capacity: int # Max tokens or max requests in window
18 refill_rate: float # Tokens per second (token bucket only)
19 window_size_ms: int # Window duration in ms (sliding window only)
20
21 def __post_init__(self):
22 if self.capacity <= 0:
23 raise ValueError("capacity must be positive")
24
25
26 class RateLimitStrategy(ABC):
27 @abstractmethod
28 def allow(self, client_id: str) -> bool: ...
29
30
31 @dataclass
32 class _TokenState:
33 tokens: float
34 last_refill: float
35
36
37 class TokenBucketStrategy(RateLimitStrategy):
38 """
39 Lazy refill token bucket. No background threads.
40 Tokens accumulate based on elapsed time since last check.
41 """
42
43 def __init__(self, config: RateLimitConfig):
44 self._config = config
45 self._buckets: dict[str, _TokenState] = {}
46 self._lock = Lock()
47
48 def allow(self, client_id: str) -> bool:
49 with self._lock:
50 now = time.monotonic()
51
52 if client_id not in self._buckets:
53 self._buckets[client_id] = _TokenState(
54 tokens=self._config.capacity, last_refill=now
55 )
56
57 state = self._buckets[client_id]
58
59 # Lazy refill: compute tokens earned since last check
60 elapsed = now - state.last_refill
61 earned = elapsed * self._config.refill_rate
62 state.tokens = min(self._config.capacity, state.tokens + earned)
63 state.last_refill = now
64
65 if state.tokens >= 1.0:
66 state.tokens -= 1.0
67 return True
68 return False
69
70
71 class SlidingWindowStrategy(RateLimitStrategy):
72 """
73 Tracks exact timestamps in a rolling window.
74 Strict counting with no burst tolerance.
75 """
76
77 def __init__(self, config: RateLimitConfig):
78 self._config = config
79 self._windows: dict[str, deque[float]] = {}
80 self._lock = Lock()
81
82 def allow(self, client_id: str) -> bool:
83 with self._lock:
84 now = time.monotonic()
85 window_start = now - (self._config.window_size_ms / 1000.0)
86
87 if client_id not in self._windows:
88 self._windows[client_id] = deque()
89
90 timestamps = self._windows[client_id]
91
92 # Prune expired timestamps
93 while timestamps and timestamps[0] < window_start:
94 timestamps.popleft()
95
96 if len(timestamps) < self._config.capacity:
97 timestamps.append(now)
98 return True
99 return False
100
101
102 class StrategyFactory:
103 """Creates the right strategy from algorithm type and config."""
104
105 @staticmethod
106 def create(algorithm: AlgorithmType, config: RateLimitConfig) -> RateLimitStrategy:
107 if algorithm == AlgorithmType.TOKEN_BUCKET:
108 return TokenBucketStrategy(config)
109 elif algorithm == AlgorithmType.SLIDING_WINDOW:
110 return SlidingWindowStrategy(config)
111 raise ValueError(f"Unknown algorithm: {algorithm}")
112
113
114 class RateLimiter:
115 """Facade. Delegates all decisions to the active strategy."""
116
117 def __init__(self, strategy: RateLimitStrategy):
118 self._strategy = strategy
119
120 def allow(self, client_id: str) -> bool:
121 return self._strategy.allow(client_id)
122
123
124 if __name__ == "__main__":
125 # Token Bucket: 5 tokens, refill 2 per second
126 config = RateLimitConfig(capacity=5, refill_rate=2.0, window_size_ms=1000)
127 strategy = StrategyFactory.create(AlgorithmType.TOKEN_BUCKET, config)
128 limiter = RateLimiter(strategy)
129
130 print("=== Token Bucket (capacity=5, refill=2/sec) ===")
131 for i in range(8):
132 result = limiter.allow("user_1")
133 status = "ALLOWED" if result else "REJECTED"
134 print(f" Request {i + 1}: {status}")
135
136 # Wait for some tokens to refill
137 print(" (waiting 1.5 seconds for refill...)")
138 time.sleep(1.5)
139
140 for i in range(4):
141 result = limiter.allow("user_1")
142 status = "ALLOWED" if result else "REJECTED"
143 print(f" Request {i + 9}: {status}")
144
145 # Sliding Window: 3 requests per 2-second window
146 print("\n=== Sliding Window (capacity=3, window=2s) ===")
147 sw_config = RateLimitConfig(capacity=3, refill_rate=1.0, window_size_ms=2000)
148 sw_strategy = StrategyFactory.create(AlgorithmType.SLIDING_WINDOW, sw_config)
149 sw_limiter = RateLimiter(sw_strategy)
150
151 for i in range(5):
152 result = sw_limiter.allow("user_2")
153 status = "ALLOWED" if result else "REJECTED"
154 print(f" Request {i + 1}: {status}")
155
156 print(" (waiting 2.1 seconds for window to slide...)")
157 time.sleep(2.1)
158
159 for i in range(3):
160 result = sw_limiter.allow("user_2")
161 status = "ALLOWED" if result else "REJECTED"
162 print(f" Request {i + 6}: {status}")
163
164 print("\nAll operations completed successfully.")Common Mistakes
- ✗Using a fixed window instead of sliding. Requests at the boundary of two windows can double your actual rate. A client can send N requests at 11:59 and N more at 12:00.
- ✗Running a timer thread to refill tokens. Wasteful. Compute the refill lazily when allow() is called by checking elapsed time.
- ✗Forgetting thread safety on the token count. Two concurrent requests can both read tokens=1 and both pass, exceeding the limit.
- ✗Not cleaning up old timestamps in sliding window. Without pruning, memory grows without bound for high-traffic clients.
Key Points
- ✓Token bucket allows short bursts up to bucket capacity, then throttles to the refill rate. Good for APIs that tolerate spikes.
- ✓Sliding window counts requests in a rolling time window. Stricter than token bucket but uses more memory per client.
- ✓Strategy pattern means you pick the algorithm at config time. Changing from token bucket to sliding window requires zero code changes in the limiter itself.
- ✓Lazy token refill avoids background threads. Calculate tokens earned since last request on each allow() call.