API Aggregator (Backend-for-Frontend)
Service that calls N downstream services concurrently and merges results into a single response. Common in mobile/web BFFs. Patterns: fan out with errgroup/TaskGroup/gather, per-call timeouts, partial-result tolerance (return what we have if some fail), per-downstream circuit breakers.
Diagram
What it is
An API aggregator (or BFF, Backend for Frontend) is a service that calls multiple downstream services concurrently and merges their results into a single response. It exists to keep clients (especially mobile) simple: one HTTP call gets everything they need.
The shape: client → aggregator → fan out to N services → merge → respond.
Without the aggregator, the client has to make N calls, manage parallelism on a constrained device, and handle partial failures itself. With it, the aggregator does that work in the data center where it's faster and easier.
What makes a good aggregator
Concurrent fan-out. Total latency is max of downstream latencies, not sum. If user-service is 50ms, cart-service is 80ms, recs-service is 200ms, the aggregator returns at 200ms (the slowest). Sequential calls would take 330ms.
Per-call timeout. Each downstream gets its own deadline. One slow downstream shouldn't push the whole page over budget. The block-level timeout (whole-page deadline) is the upper bound; per-call timeouts cap individual downstreams.
Partial-result tolerance. Some downstreams are critical (without user-service, the page is broken). Some are optional (without recs, the page renders without that section). Decide per-downstream; structure the code so optional failures degrade gracefully.
Per-downstream resilience. Each downstream gets its own circuit breaker, its own bulkhead, its own retry. When one is unhealthy, only its calls suffer; calls to healthy ones continue.
Caching. The single biggest win for aggregator performance. Per-downstream cache (the user object is stable for seconds), page-level cache (the assembled response when client and parameters match), edge cache (CDN-level when applicable).
The structured-concurrency idiom
Modern languages make this easy:
- Java 21+:
StructuredTaskScopehandles fan-out, cancellation on first failure, deadline. - Go:
errgroup.GroupwithWithContextandSetLimit. - Python:
asyncio.TaskGroupwithasyncio.timeout. - Older Java:
CompletableFuture.allOfwith manual timeout management.
These primitives bake in the cancellation semantics: when one task fails, siblings are cancelled. When the deadline fires, everything is cancelled. The aggregator code stays clean.
Partial results in practice
The pattern: separate the critical fan-out from the optional fan-out.
Critical calls go through errgroup.allSuccessfulOrThrow-style structures: any failure fails the whole. These are the calls that produce data the page absolutely needs.
Optional calls run independently. Use gather(return_exceptions=True) or try/catch per call. Failure produces a default (empty list, null) rather than an error.
This split is product-driven: ask the designer "what does the page do if recs is unavailable?" If the answer is "hide the section", recs is optional. If it's "show an error", it's critical.
Caching
Three places to cache:
Per-downstream: cache the result of each downstream call by its inputs. The user object for userId 42 is the same for some seconds. Cheap to maintain (key by call inputs), invalidated when the underlying data changes.
Page-level: the assembled response if the inputs match exactly. Higher hit rate but harder to invalidate (any underlying change invalidates).
CDN/edge: for public or weakly-personalised pages. The aggregator returns cacheable headers; the CDN serves repeat requests without ever touching the aggregator. Free if applicable.
Most teams underuse the per-downstream cache because it requires plumbing (cache key derivation, TTL, invalidation). It pays off heavily on traffic-heavy aggregators.
When an aggregator is unnecessary
Single-page apps with a thin GraphQL layer (the GraphQL server does similar fan-out, just with a different API shape). Internal services that already get exactly the data they need. Low-traffic apps where the fan-out per request doesn't matter.
For mobile-facing or high-traffic web APIs with multiple backend systems, the aggregator pays for itself in client simplicity and parallelism wins.
Implementations
All three calls run concurrently in the StructuredTaskScope. Total latency is max of the three, not sum. With joinUntil, the whole block has a hard deadline.
1 record OrderPage(User user, Cart cart, List<Item> recs) {}
2
3 OrderPage loadOrderPage(String userId) throws Exception {
4 try (var scope = StructuredTaskScope.open(
5 StructuredTaskScope.Joiner.allSuccessfulOrThrow())) {
6
7 var userTask = scope.fork(() -> userService.fetch(userId));
8 var cartTask = scope.fork(() -> cartService.fetch(userId));
9 var recsTask = scope.fork(() -> recsService.fetch(userId));
10
11 scope.joinUntil(Instant.now().plus(Duration.ofMillis(300)));
12
13 return new OrderPage(userTask.get(), cartTask.get(), recsTask.get());
14 }
15 }Each downstream has its own breaker. When recs is unhealthy, calls to it fail fast (no timeout wait). The page renders without recs immediately, not 300ms slower.
1 OrderPage loadOrderPage(String userId) throws Exception {
2 List<Item> recs;
3 try {
4 recs = recsBreaker.call(() -> recsService.fetch(userId));
5 } catch (CircuitOpenException e) {
6 recs = List.of(); // fall back instantly
7 }
8
9 // ... user and cart with their own breakers ...
10
11 return new OrderPage(user, cart, recs);
12 }Key points
- •Fan out concurrently: total latency = max of downstreams, not sum.
- •Per-call timeout: one slow downstream shouldn't drag the whole response.
- •Partial-result tolerance: decide which downstreams are critical, which are optional.
- •Per-downstream circuit breaker: when one is unhealthy, fail it fast and degrade gracefully.
- •Cache aggressively: page-level cache and per-downstream cache reduce fan-out cost.
Follow-up questions
▸What's the total-latency calculation for an aggregator?
▸How to decide which downstreams are critical vs optional?
▸Should the aggregator be its own service?
▸How are aggregator responses cached?
Gotchas
- !Sequential awaits instead of concurrent fan-out: latency becomes sum, not max
- !No per-call timeout: one slow downstream blocks the whole page
- !Treating all downstreams as critical: any single failure breaks the page
- !No circuit breaker per downstream: calls keep timing out for an unhealthy service
- !No caching: every request fans out to all backends; database hammered