API Gateway
Three structural patterns in one system. Proxy guards the door, Adapter translates protocols, and Facade hides the mess of microservices behind a single clean endpoint.
Key Abstractions
Proxy that intercepts all client requests and applies middleware before forwarding
Adapter interface wrapping different backend protocols behind a uniform handle() method
Maps route paths to their target service adapters for request routing
Pluggable request processor for auth, rate limiting, logging, and caching
Facade that orchestrates calls to multiple services and returns a single combined response
Class Diagram
The Key Insight
Most engineers think of an API Gateway as a reverse proxy. That is one third of the picture. The real design has three structural patterns stacked on top of each other, each solving a different problem.
The Proxy Pattern handles the "invisible wall" between clients and services. Clients call the gateway using the exact same HTTP interface they would use to call the backend directly. They have no idea auth checks, rate limiting, and caching are happening. That transparency is the whole point of a proxy.
The Adapter Pattern solves protocol heterogeneity. Your microservices are not uniform. One team built their service with REST. Another uses gRPC. The legacy inventory system still speaks SOAP because nobody wants to rewrite it. Without adapters, your gateway needs protocol-specific routing logic for every backend. Adapters wrap each service behind a single handle(request) interface. The gateway routes uniformly.
The Facade Pattern handles response aggregation. A mobile client showing an order details screen needs data from three services. Making three separate API calls over a cellular connection is slow and fragile. The facade makes one call to the gateway, the gateway fans out to three services on a fast internal network, and returns one combined payload.
Requirements
Functional
- Route incoming requests to the correct backend service based on path
- Support multiple backend protocols (REST, RPC, SOAP/XML) through a uniform interface
- Apply a configurable middleware pipeline: authentication, rate limiting, logging, caching
- Aggregate responses from multiple services into a single response for the client
- Register and deregister services at runtime without code changes
Non-Functional
- Middleware execution order must be deterministic and configurable
- Rate limiting should be per-client, not global
- Cache should only apply to GET requests with 200 responses
- Partial failure in aggregation should return available data, not a blanket error
Design Decisions
Why Proxy for the gateway instead of a simple router?
A router forwards requests. That is all it does. A proxy forwards requests while adding behavior that is invisible to both sides. The client sends the same HTTP request it would send to the backend directly. The backend receives what looks like a normal incoming request. Neither side knows about auth checks, rate limiting, request logging, or response caching happening in between.
This matters because cross-cutting concerns multiply fast. Today it is auth and logging. Next quarter it is request tracing, header enrichment, and A/B routing. The proxy pattern lets you stack these transparently. A simple router would force you to either push this logic into every service (duplication) or make clients aware of it (coupling).
Why Adapter instead of making all services speak the same protocol?
You do not control legacy services. The inventory system was built in 2008 and speaks SOAP. Rewriting it is a six-month project nobody will fund. The payments team chose gRPC for performance and they are not switching.
The Adapter pattern wraps each backend behind the same handle(request) -> response interface. The gateway does not care if a service speaks REST, RPC, or carrier pigeon. It calls handle() and gets a Response back. When a new team builds a GraphQL service, you write one new adapter class. Nothing else changes.
Why Facade for aggregation instead of client-side orchestration?
If the mobile app calls OrderService, then PaymentService, then UserService to render one screen, that is three round trips over a cellular connection. Each one adds latency and another failure point.
The ResponseAggregator is a facade that takes a list of paths, calls each service on the fast internal network, and merges everything into one response. The client makes one call. Three round trips become one. And if PaymentService is down, the facade returns user and order data with a clear error for the payment portion instead of failing entirely.
Why a middleware chain instead of putting all checks in the Gateway class?
Imagine auth, rate limiting, logging, and caching all inside the Gateway's handle method. That is four responsibilities in one class. Adding request tracing means editing the Gateway. Removing caching means editing the Gateway. Every change risks breaking something unrelated.
A middleware chain separates each concern into its own class. Each middleware does exactly one thing and calls next_handler to pass control forward. You can add, remove, or reorder middlewares by changing a list. The Gateway class itself stays untouched. Need request tracing next sprint? Write a TracingMiddleware and insert it. Done.
Interview Follow-ups
- "How would you handle service discovery?" Replace the static ServiceRegistry with a dynamic one backed by Consul or etcd. Services register themselves on startup and deregister on shutdown. The gateway resolves routes at request time instead of boot time.
- "How would you add circuit breaking?" Wrap each ServiceAdapter in a CircuitBreakerAdapter that tracks failure rates. After a threshold, the circuit opens and returns a fallback response without calling the backend. This prevents cascading failures when a downstream service is struggling.
- "How would you handle WebSocket connections?" WebSockets need a persistent connection, not request-response routing. Add a WebSocketProxy alongside the HTTP Gateway that upgrades connections and maintains session affinity to the correct backend. The middleware pipeline still applies on the initial handshake.
- "How would you implement canary deployments?" Add a RoutingMiddleware that reads a percentage header or user segment and forwards to either the stable or canary adapter for the same service. The ServiceRegistry holds both adapters under different keys, and the middleware picks one based on the routing rules.
Code Implementation
1 from __future__ import annotations
2 from abc import ABC, abstractmethod
3 from dataclasses import dataclass, field
4 from typing import Callable
5 import json
6 import time
7
8
9 @dataclass
10 class Request:
11 path: str
12 method: str = "GET"
13 headers: dict = field(default_factory=dict)
14 body: str = ""
15 params: dict = field(default_factory=dict)
16
17
18 @dataclass
19 class Response:
20 status_code: int
21 body: str
22 headers: dict = field(default_factory=dict)
23
24
25 # --------------- Service Adapters (Adapter Pattern) ---------------
26
27 class ServiceAdapter(ABC):
28 @abstractmethod
29 def handle(self, request: Request) -> Response:
30 ...
31
32
33 class RestAdapter(ServiceAdapter):
34 """Adapter for REST-based backend services."""
35 def __init__(self, service_name: str):
36 self.service_name = service_name
37
38 def handle(self, request: Request) -> Response:
39 # Simulates translating to REST and calling the backend
40 payload = {"source": self.service_name, "protocol": "REST",
41 "method": request.method, "path": request.path}
42 return Response(200, json.dumps(payload))
43
44
45 class RpcAdapter(ServiceAdapter):
46 """Adapter for gRPC-style RPC backend services."""
47 def __init__(self, service_name: str):
48 self.service_name = service_name
49
50 def handle(self, request: Request) -> Response:
51 # Translates the standard request into an RPC call format
52 payload = {"source": self.service_name, "protocol": "RPC",
53 "procedure": f"{request.method}_{request.path.strip('/')}"}
54 return Response(200, json.dumps(payload))
55
56
57 class LegacyAdapter(ServiceAdapter):
58 """Adapter for SOAP/XML legacy backend services."""
59 def __init__(self, service_name: str):
60 self.service_name = service_name
61
62 def handle(self, request: Request) -> Response:
63 # Wraps the request into a SOAP envelope before forwarding
64 soap_body = (f"<Envelope><Body><{self.service_name}Request>"
65 f"<path>{request.path}</path>"
66 f"</{self.service_name}Request></Body></Envelope>")
67 payload = {"source": self.service_name, "protocol": "SOAP",
68 "translated_body": soap_body}
69 return Response(200, json.dumps(payload))
70
71
72 # --------------- Service Registry ---------------
73
74 class ServiceRegistry:
75 def __init__(self):
76 self._routes: dict[str, ServiceAdapter] = {}
77
78 def register(self, path: str, adapter: ServiceAdapter) -> None:
79 self._routes[path] = adapter
80
81 def get_adapter(self, path: str) -> ServiceAdapter | None:
82 # Exact match first, then prefix match
83 if path in self._routes:
84 return self._routes[path]
85 for route, adapter in self._routes.items():
86 if path.startswith(route):
87 return adapter
88 return None
89
90
91 # --------------- Middleware (Chain of Responsibility) ---------------
92
93 # A handler is anything callable that takes a Request and returns a Response
94 Handler = Callable[[Request], Response]
95
96
97 class Middleware(ABC):
98 @abstractmethod
99 def process(self, request: Request, next_handler: Handler) -> Response:
100 ...
101
102
103 class AuthMiddleware(Middleware):
104 def process(self, request: Request, next_handler: Handler) -> Response:
105 if "Authorization" not in request.headers:
106 return Response(401, "Unauthorized: missing Authorization header")
107 return next_handler(request)
108
109
110 class RateLimitMiddleware(Middleware):
111 def __init__(self, max_requests: int = 5, window_seconds: float = 60.0):
112 self.max_requests = max_requests
113 self.window_seconds = window_seconds
114 self._counts: dict[str, list[float]] = {}
115
116 def process(self, request: Request, next_handler: Handler) -> Response:
117 client = request.headers.get("Authorization", "anonymous")
118 now = time.time()
119 timestamps = self._counts.setdefault(client, [])
120 # Evict old timestamps outside the window
121 timestamps[:] = [t for t in timestamps if now - t < self.window_seconds]
122 if len(timestamps) >= self.max_requests:
123 return Response(429, "Rate limit exceeded")
124 timestamps.append(now)
125 return next_handler(request)
126
127
128 class LoggingMiddleware(Middleware):
129 def process(self, request: Request, next_handler: Handler) -> Response:
130 print(f" [LOG] {request.method} {request.path}")
131 response = next_handler(request)
132 print(f" [LOG] Response status: {response.status_code}")
133 return response
134
135
136 class CacheMiddleware(Middleware):
137 def __init__(self):
138 self._cache: dict[str, Response] = {}
139
140 def process(self, request: Request, next_handler: Handler) -> Response:
141 if request.method == "GET":
142 cache_key = request.path
143 if cache_key in self._cache:
144 print(f" [CACHE] Hit for {cache_key}")
145 return self._cache[cache_key]
146 response = next_handler(request)
147 if response.status_code == 200:
148 self._cache[cache_key] = response
149 return response
150 return next_handler(request)
151
152
153 # --------------- Response Aggregator (Facade Pattern) ---------------
154
155 class ResponseAggregator:
156 def aggregate(self, paths: list[str], gateway: "Gateway",
157 base_request: Request) -> Response:
158 results = {}
159 for path in paths:
160 req = Request(path=path, method="GET",
161 headers=dict(base_request.headers))
162 resp = gateway.handle(req)
163 if resp.status_code == 200:
164 try:
165 results[path] = json.loads(resp.body)
166 except json.JSONDecodeError:
167 results[path] = resp.body
168 else:
169 results[path] = {"error": resp.body, "status": resp.status_code}
170 return Response(200, json.dumps(results, indent=2))
171
172
173 # --------------- Gateway (Proxy Pattern) ---------------
174
175 class Gateway:
176 def __init__(self):
177 self.registry = ServiceRegistry()
178 self._middlewares: list[Middleware] = []
179
180 def add_middleware(self, middleware: Middleware) -> None:
181 self._middlewares.append(middleware)
182
183 def handle(self, request: Request) -> Response:
184 # Build the middleware chain from inside out.
185 # The innermost handler forwards to the actual service adapter.
186 def forward(req: Request) -> Response:
187 adapter = self.registry.get_adapter(req.path)
188 if adapter is None:
189 return Response(404, f"No service registered for {req.path}")
190 return adapter.handle(req)
191
192 handler = forward
193 for mw in reversed(self._middlewares):
194 # Capture mw in closure properly
195 handler = self._wrap(mw, handler)
196 return handler(request)
197
198 @staticmethod
199 def _wrap(mw: Middleware, next_handler: Handler) -> Handler:
200 def wrapped(req: Request) -> Response:
201 return mw.process(req, next_handler)
202 return wrapped
203
204
205 if __name__ == "__main__":
206 # 1. Create gateway and register services with different adapters
207 gw = Gateway()
208 gw.registry.register("/users", RestAdapter("UserService"))
209 gw.registry.register("/payments", RpcAdapter("PaymentService"))
210 gw.registry.register("/inventory", LegacyAdapter("InventoryService"))
211
212 # 2. Add middleware chain: logging -> auth -> rate_limit -> cache
213 # Execution order: logging first, then auth, then rate limit, then cache
214 gw.add_middleware(LoggingMiddleware())
215 gw.add_middleware(AuthMiddleware())
216 gw.add_middleware(RateLimitMiddleware(max_requests=3, window_seconds=60))
217 gw.add_middleware(CacheMiddleware())
218
219 print("=== Test 1: Authenticated GET /users (REST adapter) ===")
220 resp = gw.handle(Request(path="/users", method="GET",
221 headers={"Authorization": "Bearer token123"}))
222 print(f" Status: {resp.status_code}")
223 print(f" Body: {resp.body}\n")
224
225 print("=== Test 2: Unauthenticated request (rejected by auth) ===")
226 resp = gw.handle(Request(path="/users", method="GET"))
227 print(f" Status: {resp.status_code}")
228 print(f" Body: {resp.body}\n")
229
230 print("=== Test 3: Authenticated GET /payments (RPC adapter) ===")
231 resp = gw.handle(Request(path="/payments", method="GET",
232 headers={"Authorization": "Bearer token123"}))
233 print(f" Status: {resp.status_code}")
234 print(f" Body: {resp.body}\n")
235
236 print("=== Test 4: Authenticated GET /inventory (Legacy SOAP adapter) ===")
237 resp = gw.handle(Request(path="/inventory", method="GET",
238 headers={"Authorization": "Bearer token123"}))
239 print(f" Status: {resp.status_code}")
240 print(f" Body: {resp.body}\n")
241
242 print("=== Test 5: Cache hit on repeated GET /users ===")
243 resp = gw.handle(Request(path="/users", method="GET",
244 headers={"Authorization": "Bearer token123"}))
245 print(f" Status: {resp.status_code}")
246 print(f" Body: {resp.body}\n")
247
248 print("=== Test 6: Rate limit exceeded (4th request, limit is 3) ===")
249 resp = gw.handle(Request(path="/payments", method="GET",
250 headers={"Authorization": "Bearer token123"}))
251 print(f" Status: {resp.status_code}")
252 print(f" Body: {resp.body}\n")
253
254 print("=== Test 7: Facade aggregation - /users + /payments in one call ===")
255 aggregator = ResponseAggregator()
256 # Use a fresh gateway without rate limit for clean aggregation demo
257 gw2 = Gateway()
258 gw2.registry.register("/users", RestAdapter("UserService"))
259 gw2.registry.register("/payments", RpcAdapter("PaymentService"))
260 gw2.add_middleware(LoggingMiddleware())
261 gw2.add_middleware(AuthMiddleware())
262 gw2.add_middleware(CacheMiddleware())
263 base = Request(path="/", headers={"Authorization": "Bearer token123"})
264 combined = aggregator.aggregate(["/users", "/payments"], gw2, base)
265 print(f" Status: {combined.status_code}")
266 print(f" Combined body:\n{combined.body}")Common Mistakes
- ✗Putting business logic in the gateway. The gateway routes and guards. Business logic belongs in the services.
- ✗Hardcoding service URLs. Use a registry so services can be added, removed, or moved without code changes.
- ✗Running middleware in the wrong order. Rate limiting before auth means unauthenticated users consume your rate limit budget.
- ✗Not handling partial failures in aggregation. If one of three services fails, the facade should return partial data, not crash.
Key Points
- ✓The Gateway is a proxy: same interface as backend services, but with added cross-cutting concerns.
- ✓Adapters let you integrate REST, RPC, and legacy SOAP services without the gateway knowing the difference.
- ✓Middleware chain is ordered. Auth runs first, then rate limit, then cache check, then forward.
- ✓Facade aggregation means clients make one call instead of orchestrating three service calls themselves.