Core PatternsTopic 2 of 22

PatternIntermediateAsked Often

Worker Pool Pattern

In one line

A fixed pool of N worker threads/goroutines pulls jobs off a shared queue and processes them in parallel. It bounds resource usage (no risk of running out of memory by spawning a thread per request) and keeps overhead constant under load.

Diagram

What it is

A worker pool is N long-running worker threads (or goroutines, or processes) that pull jobs off a shared queue and process them (see diagram above). Replace "spawn a thread per task" with "submit a task to the pool" and the worst class of concurrency scaling bugs disappears in one move.

The pool turns unbounded work submission into bounded work execution. Producers submit jobs (and block if the queue is full). N workers run forever, taking jobs from the queue and processing them. The queue absorbs bursts; the workers drain at their natural rate. Memory is bounded by queue size + worker stacks. No surprises.

Why it matters

Important

The most common production scaling bug Spawning a thread (or goroutine) per request "because it's cheap" is the #1 reason small services fall over under load. 10K concurrent threads = 10 GB of stack memory + scheduling chaos. The worker pool is the fix.

This pattern shows up in every concurrent service: request handlers, background jobs, batch processing, message consumers, async I/O fan-out. Knowing the variations and their tradeoffs is the difference between code that scales linearly and code that crashes at 10x load.

How it works, the core loop

queue = BoundedQueue(capacity=1000)

# Producers
for job in incoming:
    queue.put(job)        # blocks if queue is full = backpressure

# Workers (N of them)
while True:
    job = queue.take()    # blocks if queue is empty
    if job == SHUTDOWN_SENTINEL: break
    process(job)

The two key pieces: a bounded queue (so memory doesn't blow up under burst load) and N workers in an infinite loop (so thread creation cost is paid once, not per task).

Sizing the pool

Tip

Brian Goetz's formula

N_threads = N_cores × target_utilization × (1 + wait_time / compute_time)

CPU-bound (no waiting): N ≈ core count. I/O-bound (mostly waiting): N can be 10-100× core count. Mixed workloads: measure the actual ratio.

Practical advice: start at core count, ramp up by 2× until throughput plateaus, then back off slightly. Production tuning is empirical; formulas give a starting point.

The bounded queue is doing more work than it looks

Note

The queue is the backpressure mechanism When producers submit faster than workers can handle, the queue fills. With an unbounded queue (the default in Java's newFixedThreadPool!), the queue grows until OOM. With a bounded queue, submission blocks (or rejects) once the queue is full, the producer slows down, naturally limiting the load.

In production, always use a bounded queue. Java: ThreadPoolExecutor with ArrayBlockingQueue(N). Go: buffered channel with explicit capacity. Python: Queue(maxsize=N).

Rejection policies, what happens when the queue is full

When the queue and pool are both saturated, something has to give:

Throw / raise, surface the overload to the caller. Default in Java.
Caller runs, the producer thread runs the task itself. Provides natural backpressure. Often the right choice in production.
Discard newest / oldest, drop tasks. Use only if dropping is OK (metrics, telemetry).
Block, submit blocks until queue drains. Same effect as caller-runs but harder to reason about.

Worker pool variants worth knowing

Variant	When to use
Fixed-size pool	Default for production. Predictable memory, sized for workload.
Cached / elastic pool	Workloads with bursty patterns and idle periods. ⚠️ Java's `newCachedThreadPool` is unbounded, use a bounded variant.
Process pool	CPU-bound Python work; bypasses GIL.
Async + semaphore	High-concurrency I/O; bounds in-flight tasks without spawning threads.
Work-stealing pool (ForkJoinPool)	Recursive divide-and-conquer; workers steal from siblings when their own queue empties.
Per-key worker	Order-sensitive workloads, each key has one worker so updates stay sequential.

The shutdown story

Warning

Half the worker-pool bugs are at shutdown Producer stops; workers are still draining the queue. Some are still running long-tail jobs. JVM/process wants to exit. What happens to in-flight work?

The two-phase pattern: shutdown() → "no new tasks, but finish what's queued" → wait with timeout → shutdownNow() → "interrupt running tasks, drop the queue."

Always pair shutdown with a timeout. Always log "did not terminate cleanly" if the timeout fires. Never just exit and hope.

When NOT to use a worker pool

One-off background tasks (a single timer): just spawn a dedicated thread.
Truly cheap, latency-sensitive work: pool overhead dominates the work.
Workloads where ordering matters per-key: use per-key workers, not a shared pool.
Tasks that fan out further: a worker pool inside a worker pool can deadlock if pool sizes are wrong (worker A waits on a task that's queued behind worker A in the same pool).

The interview answer that wins

When asked "how should 10K concurrent requests be handled?", the right answer leads with the worker pool architecture, not "spawn a thread per request" or "use async/await." Then dig into: pool sizing, queue bounding, backpressure, shutdown. Demonstrating thought about the failure modes is the engineering signal interviewers want.

Primitives by language

ExecutorService (newFixedThreadPool, newCachedThreadPool, ForkJoinPool)
ThreadPoolExecutor for fine-grained control
BlockingQueue (work queue)
Future / CompletableFuture (results)

Implementations

BROKEN, thread-per-task

Spawning a new thread per job kills the service under load. 10K concurrent jobs = 10K threads = 10 GB of stack memory + huge context-switch storm. The thread pool exists to put a ceiling on concurrency.

1  // BROKEN, unbounded
2  void processJobs(List<Job> jobs) {
3      for (Job job : jobs) {
4          new Thread(() -> handle(job)).start();
5          // 10K jobs = 10K threads = 10 GB stack memory + cpu thrashing
6      }
7  }

FIXED, ExecutorService.newFixedThreadPool

newFixedThreadPool(N) creates exactly N worker threads. Submitted tasks queue up; workers pull and process. The bounded queue prevents OOM under burst load. Always shutdown explicitly so the JVM can exit cleanly.

 1  ExecutorService pool = Executors.newFixedThreadPool(8);
 2  
 3  List<Future<Result>> futures = new ArrayList<>();
 4  for (Job job : jobs) {
 5      futures.add(pool.submit(() -> handle(job)));   // queues work
 6  }
 7  
 8  List<Result> results = new ArrayList<>();
 9  for (Future<Result> f : futures) {
10      results.add(f.get());   // blocks for each result in submission order
11  }
12  
13  pool.shutdown();
14  pool.awaitTermination(30, TimeUnit.SECONDS);

Configurable, ThreadPoolExecutor with bounded queue

For production, build the pool explicitly: bounded work queue, named threads, sensible rejection policy. Unbounded LinkedBlockingQueue (the default in newFixedThreadPool) hides backpressure problems, bursts grow the queue until OOM.

 1  ThreadPoolExecutor pool = new ThreadPoolExecutor(
 2      8, 8,                                  // core, max threads
 3      0L, TimeUnit.MILLISECONDS,             // keep-alive (irrelevant for fixed)
 4      new ArrayBlockingQueue<>(1000),        // BOUNDED queue, backpressure
 5      new ThreadFactoryBuilder().setNameFormat("worker-%d").build(),
 6      new ThreadPoolExecutor.CallerRunsPolicy()  // overflow → run on caller (slows producer)
 7  );
 8  
 9  // CallerRunsPolicy is often the right choice, it provides natural backpressure
10  // by making the submitter do the work when the pool is saturated.

Key points

•Fixed N workers > thread-per-task: bounded memory, predictable overhead
•Workers loop: pull from queue, process, repeat, until queue closes
•Result delivery: separate result channel, Future objects, or callback
•Sizing: ~core count for CPU work, much higher for I/O, use Goetz's formula
•Always have a shutdown story: close queue → workers drain → join all
•Error propagation matters: one worker failing should signal the rest

Tradeoffs

Option	Pros	Cons	When to use
Fixed-size thread pool	Bounded memory Predictable overhead Backpressure on full queue	Sized poorly = throughput bottleneck or wasted resources Mixed CPU+I/O workloads need careful tuning	Default for production servers, ExecutorService, ThreadPoolExecutor, errgroup.SetLimit
Cached thread pool (newCachedThreadPool)	Adapts to bursts Idle threads die off	UNBOUNDED, under sustained load = OOM Don't use in production	Tests, scripts, never production servers
Process pool (multiprocessing)	Bypasses Python's GIL Crash isolation	~50ms startup per process IPC overhead Args must pickle	CPU-bound Python work
Async semaphore-bounded pool	Cheap (~few KB per task) Scales to 100K+ Cooperative shutdown via cancel	Requires async libraries Blocking call stalls everyone	High-concurrency I/O, web crawlers, fan-out gateways

Follow-up questions

▸How is a worker pool sized?

CPU-bound: pool ≈ core count. I/O-bound: Goetz's formula → cores × (1 + wait/compute). For 90% I/O work, that's ~10× cores. Empirically tune: ramp up workers, watch throughput plateau, back off slightly.

▸What's the right rejection policy when the queue is full?

Default in Java's ThreadPoolExecutor is to throw (RejectedExecutionException). Production-favorite: CallerRunsPolicy, the submitter does the work, providing natural backpressure. DiscardPolicy silently drops tasks (use only if dropping is acceptable, e.g. metrics). Never use unbounded queues in production.

▸Why is newCachedThreadPool() dangerous?

It uses a SynchronousQueue with no upper bound on threads. Under sustained load, every submission spawns a new thread, until the JVM hits the OS thread limit (typically a few thousand) and crashes. Stick to fixed-size pools.

▸Should each request get its own thread, or share a pool?

Share a pool. Thread-per-request is fine at small scale (a few hundred req/sec) and is what Java pre-21 servlet servers do, but it doesn't bound resource usage. Modern advice: virtual threads (Java 21+) for the request-per-thread model with effectively no upper bound; fixed pool for CPU-heavy work.

▸How are errors propagated from a worker pool?

Java: Future.get() throws ExecutionException wrapping the worker's exception. Python: future.exception() or .result() raises. Go: errgroup.Group.Wait() returns the first non-nil error. Custom pools need an error channel + a 'first failure cancels all' protocol.

Gotchas

!Java's newFixedThreadPool uses an unbounded LinkedBlockingQueue by default, bursts → OOM. Use ThreadPoolExecutor explicitly with ArrayBlockingQueue.
!Forgetting to call shutdown() on an ExecutorService → JVM doesn't exit
!Forgetting to close the work channel in Go → workers block forever waiting for more
!Calling .get() on a Future inside a worker → can deadlock if the pool is full and the worker is waiting for another task to complete
!Task throws an exception, future swallows it, always log future.exception() or wrap in try/catch
!ProcessPoolExecutor with closures or lambdas → pickle errors; use top-level functions

Common pitfalls

Sizing pools by intuition instead of measurement
Mixing CPU and I/O work in one pool, then sizing for one nature only
Unbounded work queue → backpressure problems hide as OOM
Forgetting graceful shutdown, drops in-flight work

Practice problems

Implement a thread-safe job queue with N workers

BlockingQueue + N Thread workers in a loop, sentinel for shutdown

Implement a rate-limited worker pool

Worker pool + token bucket, workers wait for token before processing

APIs worth memorising

Java: ExecutorService, ThreadPoolExecutor, ForkJoinPool, BlockingQueue, ThreadFactory, RejectedExecutionHandler
Python: concurrent.futures.{ThreadPoolExecutor, ProcessPoolExecutor}, multiprocessing.Pool, asyncio.Semaphore
Go: errgroup.Group, sync.WaitGroup + channel, golang.org/x/sync/semaphore

Where this shows up

Every server. nginx, Apache, Tomcat use worker thread pools. Kafka consumer groups distribute partitions across worker threads. Celery uses a process pool. ML inference services use ThreadPoolExecutor or virtual threads to fan out batched requests. The worker pool is the single most-used concurrency pattern in production.

Worker Pool Pattern

In one line

Diagram

What it is

Why it matters

Important

How it works, the core loop

queue = BoundedQueue(capacity=1000)

# Producers
for job in incoming:
    queue.put(job)        # blocks if queue is full = backpressure

# Workers (N of them)
while True:
    job = queue.take()    # blocks if queue is empty
    if job == SHUTDOWN_SENTINEL: break
    process(job)

The two key pieces: a bounded queue (so memory doesn't blow up under burst load) and N workers in an infinite loop (so thread creation cost is paid once, not per task).

Sizing the pool

Tip

Brian Goetz's formula

N_threads = N_cores × target_utilization × (1 + wait_time / compute_time)

CPU-bound (no waiting): N ≈ core count. I/O-bound (mostly waiting): N can be 10-100× core count. Mixed workloads: measure the actual ratio.

Practical advice: start at core count, ramp up by 2× until throughput plateaus, then back off slightly. Production tuning is empirical; formulas give a starting point.

The bounded queue is doing more work than it looks

Note

In production, always use a bounded queue. Java: ThreadPoolExecutor with ArrayBlockingQueue(N). Go: buffered channel with explicit capacity. Python: Queue(maxsize=N).

Rejection policies, what happens when the queue is full

When the queue and pool are both saturated, something has to give:

Throw / raise, surface the overload to the caller. Default in Java.
Caller runs, the producer thread runs the task itself. Provides natural backpressure. Often the right choice in production.
Discard newest / oldest, drop tasks. Use only if dropping is OK (metrics, telemetry).
Block, submit blocks until queue drains. Same effect as caller-runs but harder to reason about.

Worker pool variants worth knowing

Variant	When to use
Fixed-size pool	Default for production. Predictable memory, sized for workload.
Cached / elastic pool	Workloads with bursty patterns and idle periods. ⚠️ Java's `newCachedThreadPool` is unbounded, use a bounded variant.
Process pool	CPU-bound Python work; bypasses GIL.
Async + semaphore	High-concurrency I/O; bounds in-flight tasks without spawning threads.
Work-stealing pool (ForkJoinPool)	Recursive divide-and-conquer; workers steal from siblings when their own queue empties.
Per-key worker	Order-sensitive workloads, each key has one worker so updates stay sequential.

The shutdown story

Warning

Half the worker-pool bugs are at shutdown Producer stops; workers are still draining the queue. Some are still running long-tail jobs. JVM/process wants to exit. What happens to in-flight work?

The two-phase pattern: shutdown() → "no new tasks, but finish what's queued" → wait with timeout → shutdownNow() → "interrupt running tasks, drop the queue."

Always pair shutdown with a timeout. Always log "did not terminate cleanly" if the timeout fires. Never just exit and hope.

When NOT to use a worker pool

One-off background tasks (a single timer): just spawn a dedicated thread.
Truly cheap, latency-sensitive work: pool overhead dominates the work.
Workloads where ordering matters per-key: use per-key workers, not a shared pool.
Tasks that fan out further: a worker pool inside a worker pool can deadlock if pool sizes are wrong (worker A waits on a task that's queued behind worker A in the same pool).

The interview answer that wins

Primitives by language

ExecutorService (newFixedThreadPool, newCachedThreadPool, ForkJoinPool)
ThreadPoolExecutor for fine-grained control
BlockingQueue (work queue)
Future / CompletableFuture (results)

Implementations

BROKEN, thread-per-task

1  // BROKEN, unbounded
2  void processJobs(List<Job> jobs) {
3      for (Job job : jobs) {
4          new Thread(() -> handle(job)).start();
5          // 10K jobs = 10K threads = 10 GB stack memory + cpu thrashing
6      }
7  }

FIXED, ExecutorService.newFixedThreadPool

 1  ExecutorService pool = Executors.newFixedThreadPool(8);
 2  
 3  List<Future<Result>> futures = new ArrayList<>();
 4  for (Job job : jobs) {
 5      futures.add(pool.submit(() -> handle(job)));   // queues work
 6  }
 7  
 8  List<Result> results = new ArrayList<>();
 9  for (Future<Result> f : futures) {
10      results.add(f.get());   // blocks for each result in submission order
11  }
12  
13  pool.shutdown();
14  pool.awaitTermination(30, TimeUnit.SECONDS);

Configurable, ThreadPoolExecutor with bounded queue

 1  ThreadPoolExecutor pool = new ThreadPoolExecutor(
 2      8, 8,                                  // core, max threads
 3      0L, TimeUnit.MILLISECONDS,             // keep-alive (irrelevant for fixed)
 4      new ArrayBlockingQueue<>(1000),        // BOUNDED queue, backpressure
 5      new ThreadFactoryBuilder().setNameFormat("worker-%d").build(),
 6      new ThreadPoolExecutor.CallerRunsPolicy()  // overflow → run on caller (slows producer)
 7  );
 8  
 9  // CallerRunsPolicy is often the right choice, it provides natural backpressure
10  // by making the submitter do the work when the pool is saturated.

Key points

•Fixed N workers > thread-per-task: bounded memory, predictable overhead
•Workers loop: pull from queue, process, repeat, until queue closes
•Result delivery: separate result channel, Future objects, or callback
•Sizing: ~core count for CPU work, much higher for I/O, use Goetz's formula
•Always have a shutdown story: close queue → workers drain → join all
•Error propagation matters: one worker failing should signal the rest

Tradeoffs

Option	Pros	Cons	When to use
Fixed-size thread pool	Bounded memory Predictable overhead Backpressure on full queue	Sized poorly = throughput bottleneck or wasted resources Mixed CPU+I/O workloads need careful tuning	Default for production servers, ExecutorService, ThreadPoolExecutor, errgroup.SetLimit
Cached thread pool (newCachedThreadPool)	Adapts to bursts Idle threads die off	UNBOUNDED, under sustained load = OOM Don't use in production	Tests, scripts, never production servers
Process pool (multiprocessing)	Bypasses Python's GIL Crash isolation	~50ms startup per process IPC overhead Args must pickle	CPU-bound Python work
Async semaphore-bounded pool	Cheap (~few KB per task) Scales to 100K+ Cooperative shutdown via cancel	Requires async libraries Blocking call stalls everyone	High-concurrency I/O, web crawlers, fan-out gateways

Follow-up questions

▸How is a worker pool sized?

▸What's the right rejection policy when the queue is full?

▸Why is newCachedThreadPool() dangerous?

▸Should each request get its own thread, or share a pool?

▸How are errors propagated from a worker pool?

Gotchas

!Java's newFixedThreadPool uses an unbounded LinkedBlockingQueue by default, bursts → OOM. Use ThreadPoolExecutor explicitly with ArrayBlockingQueue.
!Forgetting to call shutdown() on an ExecutorService → JVM doesn't exit
!Forgetting to close the work channel in Go → workers block forever waiting for more
!Calling .get() on a Future inside a worker → can deadlock if the pool is full and the worker is waiting for another task to complete
!Task throws an exception, future swallows it, always log future.exception() or wrap in try/catch
!ProcessPoolExecutor with closures or lambdas → pickle errors; use top-level functions

Common pitfalls

Sizing pools by intuition instead of measurement
Mixing CPU and I/O work in one pool, then sizing for one nature only
Unbounded work queue → backpressure problems hide as OOM
Forgetting graceful shutdown, drops in-flight work

Practice problems

Implement a thread-safe job queue with N workers

BlockingQueue + N Thread workers in a loop, sentinel for shutdown

Implement a rate-limited worker pool

Worker pool + token bucket, workers wait for token before processing

APIs worth memorising

Java: ExecutorService, ThreadPoolExecutor, ForkJoinPool, BlockingQueue, ThreadFactory, RejectedExecutionHandler
Python: concurrent.futures.{ThreadPoolExecutor, ProcessPoolExecutor}, multiprocessing.Pool, asyncio.Semaphore
Go: errgroup.Group, sync.WaitGroup + channel, golang.org/x/sync/semaphore

Where this shows up

Diagram

What it is

Why it matters

How it works, the core loop

Sizing the pool

The bounded queue is doing more work than it looks

Rejection policies, what happens when the queue is full

Worker pool variants worth knowing

The shutdown story

When NOT to use a worker pool

The interview answer that wins

Primitives by language

Implementations

Key points

Tradeoffs

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading

Diagram

What it is

Why it matters

How it works, the core loop

Sizing the pool

The bounded queue is doing more work than it looks

Rejection policies, what happens when the queue is full

Worker pool variants worth knowing

The shutdown story

When NOT to use a worker pool

The interview answer that wins

Primitives by language

Implementations

Key points

Tradeoffs

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading