Bulkhead Pattern
Isolate resources so a failure in one part of the system doesn't drain everything. Separate thread pool, connection pool, or rate limit per downstream so one slow downstream can't tie up resources needed by healthy ones. Named after ship hulls divided into compartments to prevent total flooding.
Diagram
What it is
The bulkhead pattern is a way of arranging your service so that one slow or broken thing cannot drag the rest of the service down with it. The name comes from ships. The hull of a ship is divided into sealed compartments. If one compartment springs a leak, the water fills only that compartment. The other compartments stay dry and the ship keeps floating.
In a service, the "compartments" are pools of finite resources. The most common ones are threads, database connections, or simply a counter of how many calls can be in flight at once. The rule is: every external thing your service calls gets its own compartment. The payment API gets its own pool of 20 slots. The user service gets its own pool of 30. The recommendations service gets its own pool of 10. None of them shares.
If the payment API suddenly slows down, the only thing that fills up is the payment pool. Calls to the user service and the recommendations service keep going through their own pools, untouched.
Why this matters: the failure without it
The default setup most services start with is one shared pool. Every incoming request grabs a thread from one pool, calls whichever downstream it needs, and gives the thread back when done. That works fine until one downstream gets slow.
The payment service starts taking 30 seconds per call. Every thread that picks up a payment request sits and waits. Pretty soon all 50 threads in the pool are stuck on payment calls. New requests for the user service and the recommendations service show up, but there are no threads to handle them. Both services start timing out from the outside even though they were healthy the whole time.
This is what people mean when they say "one slow dependency took down the whole service". The slow dependency did not actually break anything in the calling service. It just held the shared resource long enough that nothing else could get any.
How the bulkhead fixes it
Give each downstream its own pool. Now the slow one can fill up its own pool all it wants, but the other pools are unaffected.
When the payment pool fills up, new payment requests fail fast (or wait briefly with a tight deadline) instead of grabbing a thread out of a shared pile. The 30 user-service slots and the 10 recs slots are still available to anyone who needs them. The blast radius of the payment slowdown is bounded to "payment requests fail or queue", not "the whole service falls over".
That is the core of the pattern. Each downstream lives in its own compartment. One compartment can flood without sinking the ship.
Two ways to build the compartments
There are two common implementations and the choice depends on your language and how worried you are about the downstream call leaking resources.
A separate thread pool per downstream. This is the strongest version. Each downstream gets its own dedicated set of threads. Even if the downstream's client library has a memory leak, hangs in a tight loop, or behaves badly in some other way, the damage is limited to that pool's threads. The cost is more threads to manage, more memory.
A shared thread pool with a separate counter (semaphore) per downstream. Threads come from one common pool, but each downstream has its own permit count. Before a thread can call the payment API it has to grab a payment permit. Permits run out before the thread pool runs out, so a slow payment downstream still cannot consume more than its share of threads. Lower overhead, slightly weaker isolation. This is the default choice for most services and it is the only choice that makes sense in Go or in async Python where threads or tasks are cheap to create.
Sizing the compartments
The right size for a bulkhead is "just enough to handle the downstream's normal load, plus a little headroom for bursts". The formula most people use is Little's Law: number of in-flight calls equals throughput times latency.
If the payment API normally handles 100 requests per second and each call takes 200 milliseconds, then at any moment there are about 100 times 0.2 = 20 calls in flight. Set the bulkhead at around 30 to absorb bursts without rejecting healthy traffic.
Get this wrong in either direction and the bulkhead does the wrong thing:
- Too small. Real traffic hits the limit during normal operation and you start rejecting requests that the downstream could have handled.
- Too large. By the time the bulkhead fills, the calling service is already in trouble. The bulkhead provided no protection because it never actually limited anything.
Bulkheads are not enough on their own
A bulkhead by itself still has problems. If individual calls hang forever, the bulkhead just fills up with hung calls and never recovers. If the downstream is fully dead, the bulkhead fills with timing-out calls and the calling service keeps wasting cycles on dead requests.
The bulkhead is one layer in a stack. The full set of layers, from inside out:
- Timeout on every call. Each call has its own deadline. After that deadline, the slot is freed regardless of what the downstream is doing.
- Bulkhead. Caps how many of those calls can be in flight at the same time.
- Circuit breaker. Watches the failure rate. When it gets bad enough, the breaker opens and all new calls fail instantly without even trying. The bulkhead does not fill up because nothing is calling through.
- Retry with backoff. For the calls that do go through and fail in a transient way, give them a second or third try with a delay between attempts.
Each layer covers a weakness in the others. Production-grade resilience uses all four together.
When you do not need this
A bulkhead is overhead. The pools have to be sized, monitored, and tuned. Skip it when:
- Your service only calls one downstream. There is nothing to isolate from. The connection pool to that one downstream already does the same job.
- The calls in question are inside the same process and do not own real resources (a function call from one module to another).
- Your service has no downstream calls of any kind, just CPU work.
Reach for it when your service depends on three or more external systems. By that point, the chance that one of them goes slow on any given day is high enough that not isolating them is just betting against history.
Implementations
Each downstream gets its own ExecutorService. If the payment API is slow, its pool fills up but the user-service pool is untouched. Without this, both downstreams share the request-handling pool and the slow one starves the fast one.
1 public class ServiceClients {
2 // BAD: shared pool, slow downstream blocks calls to fast one
3 // private final ExecutorService shared = Executors.newFixedThreadPool(50);
4
5 // GOOD: per-downstream pools
6 private final ExecutorService paymentPool =
7 Executors.newFixedThreadPool(20, threadFactory("payment"));
8 private final ExecutorService userPool =
9 Executors.newFixedThreadPool(30, threadFactory("user"));
10 private final ExecutorService recsPool =
11 Executors.newFixedThreadPool(10, threadFactory("recs"));
12
13 public CompletableFuture<Payment> charge(ChargeRequest req) {
14 return CompletableFuture.supplyAsync(() -> paymentApi.charge(req), paymentPool);
15 }
16
17 public CompletableFuture<User> getUser(String id) {
18 return CompletableFuture.supplyAsync(() -> userApi.get(id), userPool);
19 }
20 }Resilience4j's Bulkhead is built for this. Two implementations: SemaphoreBulkhead (cap concurrent calls) and ThreadPoolBulkhead (separate thread pool). Wrap each downstream with its own.
1 import io.github.resilience4j.bulkhead.*;
2
3 BulkheadConfig config = BulkheadConfig.custom()
4 .maxConcurrentCalls(20)
5 .maxWaitDuration(Duration.ofMillis(100))
6 .build();
7
8 Bulkhead paymentBulkhead = Bulkhead.of("payment", config);
9
10 Supplier<Response> decorated = Bulkhead.decorateSupplier(paymentBulkhead,
11 () -> paymentApi.call(req));
12
13 try {
14 Response r = decorated.get();
15 } catch (BulkheadFullException e) {
16 // Bulkhead is full; downstream is overloaded; fall back
17 return cachedResponse();
18 }Key points
- •Each downstream gets its own resource budget (threads, connections, semaphore permits).
- •When one downstream is slow, only its budget exhausts. Calls to other downstreams keep flowing.
- •Without bulkheads, a single thread pool serving all downstreams gets fully tied up by the slow one. Latency cascades.
- •Pair with circuit breaker (fail-fast) and timeout (don't wait forever) for full resilience.
- •Cost: more threads/connections to manage. The win is bounded blast radius.
Follow-up questions
▸Bulkhead vs circuit breaker?
▸How big should each bulkhead be?
▸Why not just use timeouts?
▸What about thread pool isolation specifically?
Gotchas
- !One shared pool with no per-downstream cap = slow downstream blocks all calls
- !Bulkhead too small = false rejections under normal traffic
- !Bulkhead too big = downstream still gets overloaded; bulkhead provided no protection
- !Forgetting to release the semaphore on exception leaks permits permanently
- !Bulkhead without timeout = each in-flight call can still hang forever; combine them