FoundationalProblem 13 of 20

FoundationalIntermediate

Rate Limiter

Strategy PatternFactory Pattern

Token bucket for smooth throughput, sliding window for strict counting. Strategy pattern lets you swap algorithms without rewiring the system.

Key Abstractions

RateLimiter

Facade that checks if a request is allowed and delegates to the active strategy

RateLimitStrategy

Interface for swappable algorithms. Each strategy decides allow or reject independently.

TokenBucketStrategy

Refills tokens at a fixed rate, each request costs one token. Allows controlled bursts.

SlidingWindowStrategy

Tracks timestamps in a rolling window. Strict count-based limiting with no burst allowance.

RateLimitConfig

Immutable config holding capacity, refill rate, and window size parameters

Class Diagram

How It Works

Rate limiting is about saying "no" gracefully. You have a finite resource (API endpoints, database connections, message queues) and you need to prevent any single client from consuming more than their fair share. The question is how you count and enforce those limits.

Token bucket is the most popular approach in production systems. Picture a bucket that holds, say, 10 tokens. Each request costs one token. A background process (or lazy calculation) adds tokens back at a steady rate. If the bucket is empty, the request gets rejected. This naturally allows short bursts (a full bucket lets a client fire 10 requests at once) while enforcing a long-term average rate.

Sliding window is the stricter alternative. You keep a log of timestamps for each request. When a new request comes in, you count how many timestamps fall within the last N seconds. If you're at the limit, reject. No bursts allowed. The downside is memory: you're storing a timestamp per request instead of a single counter.

Requirements

Functional

allow(clientId) returns true if the request should proceed, false if it should be throttled
Support per-client rate tracking (different clients have independent limits)
Support at least two algorithms: token bucket and sliding window
Algorithms should be swappable at configuration time

Non-Functional

O(1) for token bucket decisions, O(n) worst case for sliding window pruning (amortized O(1))
Thread-safe for concurrent request processing
No background threads for token refill. Lazy computation is sufficient and simpler.

Design Decisions

Why token bucket as the primary algorithm?

Token bucket gives you two useful properties in one package. First, it enforces a long-term average rate (the refill rate). Second, it permits controlled bursts (up to bucket capacity). Most real APIs want this behavior. A mobile app might batch several API calls on launch, and that's fine as long as the average stays within bounds. Sliding window would reject those burst requests even though the overall rate is acceptable.

Why not fixed window counting?

Fixed window has the boundary burst problem. Imagine a limit of 100 requests per minute, window starting at :00. A client sends 100 requests at :59, then 100 more at :01. That's 200 requests in 2 seconds, double the intended rate. Sliding window solves this by always looking at the last 60 seconds from "right now." Token bucket solves it differently by making burst capacity explicit. Fixed window is simpler to implement but the edge case is a real production problem.

Why lazy refill instead of a timer thread?

A timer thread seems natural. Every second, iterate all buckets and add tokens. But if you have 100,000 active clients, you're updating 100,000 counters every tick even when most of them aren't making requests. Lazy refill computes the tokens earned since the last call at request time. You do a tiny bit of math per request, but you never waste work on idle clients. Less code, less CPU, fewer concurrency headaches.

Why Strategy + Factory over a simple if/else?

Right now we have two algorithms. But production systems often add more: leaky bucket, sliding window counter (approximate), adaptive rate limiting. Each new algorithm shouldn't require touching the RateLimiter class. Strategy isolates each algorithm. Factory encapsulates the mapping from config to strategy. Adding a new algorithm means writing a new class and one new case in the factory. Nothing else changes.

Interview Follow-ups

"How would you handle distributed rate limiting?" Move the token state to a shared store like Redis. Use Lua scripting for atomic read-modify-write of the token count. Redis MULTI/EXEC also works but Lua is single-threaded in Redis, eliminating race conditions.
"What about rate limiting by IP vs. by user vs. by API key?" The client_id parameter is already abstract. You extract the right identifier in middleware and pass it to allow(). You can even compose limiters (global + per-user) by chaining multiple checks.
"How would you communicate limits to clients?" HTTP headers. X-RateLimit-Limit for the max, X-RateLimit-Remaining for current tokens, X-RateLimit-Reset for when the window resets or tokens refill. Return 429 Too Many Requests when rejected.
"What if you need different limits for different endpoints?" Create a limiter per endpoint (or per endpoint group) with its own config. A map from endpoint pattern to RateLimiter instance handles the routing.

Code Implementation

  1  from abc import ABC, abstractmethod
  2  from collections import deque
  3  from dataclasses import dataclass
  4  from enum import Enum
  5  from threading import Lock
  6  import time
  7  
  8  
  9  class AlgorithmType(Enum):
 10      TOKEN_BUCKET = "token_bucket"
 11      SLIDING_WINDOW = "sliding_window"
 12  
 13  
 14  @dataclass(frozen=True)
 15  class RateLimitConfig:
 16      """Immutable configuration for rate limiting."""
 17      capacity: int          # Max tokens or max requests in window
 18      refill_rate: float     # Tokens per second (token bucket only)
 19      window_size_ms: int    # Window duration in ms (sliding window only)
 20  
 21      def __post_init__(self):
 22          if self.capacity <= 0:
 23              raise ValueError("capacity must be positive")
 24  
 25  
 26  class RateLimitStrategy(ABC):
 27      @abstractmethod
 28      def allow(self, client_id: str) -> bool: ...
 29  
 30  
 31  @dataclass
 32  class _TokenState:
 33      tokens: float
 34      last_refill: float
 35  
 36  
 37  class TokenBucketStrategy(RateLimitStrategy):
 38      """
 39      Lazy refill token bucket. No background threads.
 40      Tokens accumulate based on elapsed time since last check.
 41      """
 42  
 43      def __init__(self, config: RateLimitConfig):
 44          self._config = config
 45          self._buckets: dict[str, _TokenState] = {}
 46          self._lock = Lock()
 47  
 48      def allow(self, client_id: str) -> bool:
 49          with self._lock:
 50              now = time.monotonic()
 51  
 52              if client_id not in self._buckets:
 53                  self._buckets[client_id] = _TokenState(
 54                      tokens=self._config.capacity, last_refill=now
 55                  )
 56  
 57              state = self._buckets[client_id]
 58  
 59              # Lazy refill: compute tokens earned since last check
 60              elapsed = now - state.last_refill
 61              earned = elapsed * self._config.refill_rate
 62              state.tokens = min(self._config.capacity, state.tokens + earned)
 63              state.last_refill = now
 64  
 65              if state.tokens >= 1.0:
 66                  state.tokens -= 1.0
 67                  return True
 68              return False
 69  
 70  
 71  class SlidingWindowStrategy(RateLimitStrategy):
 72      """
 73      Tracks exact timestamps in a rolling window.
 74      Strict counting with no burst tolerance.
 75      """
 76  
 77      def __init__(self, config: RateLimitConfig):
 78          self._config = config
 79          self._windows: dict[str, deque[float]] = {}
 80          self._lock = Lock()
 81  
 82      def allow(self, client_id: str) -> bool:
 83          with self._lock:
 84              now = time.monotonic()
 85              window_start = now - (self._config.window_size_ms / 1000.0)
 86  
 87              if client_id not in self._windows:
 88                  self._windows[client_id] = deque()
 89  
 90              timestamps = self._windows[client_id]
 91  
 92              # Prune expired timestamps
 93              while timestamps and timestamps[0] < window_start:
 94                  timestamps.popleft()
 95  
 96              if len(timestamps) < self._config.capacity:
 97                  timestamps.append(now)
 98                  return True
 99              return False
100  
101  
102  class StrategyFactory:
103      """Creates the right strategy from algorithm type and config."""
104  
105      @staticmethod
106      def create(algorithm: AlgorithmType, config: RateLimitConfig) -> RateLimitStrategy:
107          if algorithm == AlgorithmType.TOKEN_BUCKET:
108              return TokenBucketStrategy(config)
109          elif algorithm == AlgorithmType.SLIDING_WINDOW:
110              return SlidingWindowStrategy(config)
111          raise ValueError(f"Unknown algorithm: {algorithm}")
112  
113  
114  class RateLimiter:
115      """Facade. Delegates all decisions to the active strategy."""
116  
117      def __init__(self, strategy: RateLimitStrategy):
118          self._strategy = strategy
119  
120      def allow(self, client_id: str) -> bool:
121          return self._strategy.allow(client_id)
122  
123  
124  if __name__ == "__main__":
125      # Token Bucket: 5 tokens, refill 2 per second
126      config = RateLimitConfig(capacity=5, refill_rate=2.0, window_size_ms=1000)
127      strategy = StrategyFactory.create(AlgorithmType.TOKEN_BUCKET, config)
128      limiter = RateLimiter(strategy)
129  
130      print("=== Token Bucket (capacity=5, refill=2/sec) ===")
131      for i in range(8):
132          result = limiter.allow("user_1")
133          status = "ALLOWED" if result else "REJECTED"
134          print(f"  Request {i + 1}: {status}")
135  
136      # Wait for some tokens to refill
137      print("  (waiting 1.5 seconds for refill...)")
138      time.sleep(1.5)
139  
140      for i in range(4):
141          result = limiter.allow("user_1")
142          status = "ALLOWED" if result else "REJECTED"
143          print(f"  Request {i + 9}: {status}")
144  
145      # Sliding Window: 3 requests per 2-second window
146      print("\n=== Sliding Window (capacity=3, window=2s) ===")
147      sw_config = RateLimitConfig(capacity=3, refill_rate=1.0, window_size_ms=2000)
148      sw_strategy = StrategyFactory.create(AlgorithmType.SLIDING_WINDOW, sw_config)
149      sw_limiter = RateLimiter(sw_strategy)
150  
151      for i in range(5):
152          result = sw_limiter.allow("user_2")
153          status = "ALLOWED" if result else "REJECTED"
154          print(f"  Request {i + 1}: {status}")
155  
156      print("  (waiting 2.1 seconds for window to slide...)")
157      time.sleep(2.1)
158  
159      for i in range(3):
160          result = sw_limiter.allow("user_2")
161          status = "ALLOWED" if result else "REJECTED"
162          print(f"  Request {i + 6}: {status}")
163  
164      print("\nAll operations completed successfully.")

Common Mistakes

✗Using a fixed window instead of sliding. Requests at the boundary of two windows can double your actual rate. A client can send N requests at 11:59 and N more at 12:00.
✗Running a timer thread to refill tokens. Wasteful. Compute the refill lazily when allow() is called by checking elapsed time.
✗Forgetting thread safety on the token count. Two concurrent requests can both read tokens=1 and both pass, exceeding the limit.
✗Not cleaning up old timestamps in sliding window. Without pruning, memory grows without bound for high-traffic clients.

Key Points

✓Token bucket allows short bursts up to bucket capacity, then throttles to the refill rate. Good for APIs that tolerate spikes.
✓Sliding window counts requests in a rolling time window. Stricter than token bucket but uses more memory per client.
✓Strategy pattern means you pick the algorithm at config time. Changing from token bucket to sliding window requires zero code changes in the limiter itself.
✓Lazy token refill avoids background threads. Calculate tokens earned since last request on each allow() call.

Related Designs