Backpressure: Strategies and Signals

What it is

Backpressure is what happens when one part of a system can't keep up with another. Producer faster than consumer. Downstream slower than upstream. Without an explicit strategy, the slow side has only bad options: things pile up in memory until the process dies, or work gets silently dropped, or threads block forever.

Picking a backpressure strategy turns "slow consumer crashes the system" into "slow consumer slows the producer."

The basic shape:

  Producer -----> Queue -----> Consumer
  (fast)         (filling)    (slow)

  As the queue fills, the system has to decide:
  what happens when it's full?

The four strategies, with pictures

1. BLOCK            Queue full > producer waits.
                    
   Producer --> [████████████]  --> Consumer
                ^ full
   Producer is blocked, Consumer drains, then producer can push again.
   Producer's rate caps at consumer's rate automatically.

2. DROP             Queue full > throw away new (or oldest).
                    
   Producer --> [████████████]  --> Consumer
                       ^ full, drop new item or evict oldest
   New events: silently lost OR oldest events lost.
   Counter goes up: "dropped 1,234 events in last minute."

3. SHED             Queue full > reject incoming with an error.
                    
   Producer --> (503 Service Unavailable, Retry-After: 5s)
                Caller decides: retry, fall back, fail-fast, degrade.

4. PROPAGATE        Pass a deadline through every layer.
                    
   Producer --> [....] --> Consumer (slow)
       ^                       ^
       | "must finish by t+5s" passes through every call
       |
       Each layer checks the deadline. If exceeded, fail fast,
       propagate the timeout upward.

When each one fits

BLOCK     Use when:
            - Producer is a worker (not a request handler)
            - Capping producer rate at consumer rate is acceptable
            - Response latency is not a concern
          Examples: batch ETL, log shipping, background workers

DROP      Use when:
            - The data is fungible (one sample as good as another)
            - Stale is worse than missing
          Examples: metrics, telemetry, sensor readings, log sampling
          ALWAYS instrument the drop count.

SHED      Use when:
            - The code is at the edge of a service
            - Queueing the request buys nothing (it would time out anyway)
            - Caller can retry / fall back
          Examples: HTTP request handlers under load, API gateways

PROPAGATE Use when:
            - The code crosses service boundaries
            - The caller cares about end-to-end latency
            - Timeouts should free up resources cheaply
          Examples: RPC chains, microservice calls

The most common mistake is using "block" everywhere. Blocking in a request handler ties up threads waiting for capacity that isn't coming. The right pattern at the edge is shed; internally, bounded queue or propagate.

Picking the right one

Match the strategy to the workload:

Scenario	Strategy
Batch / ETL pipeline	Block (bounded queue)
Telemetry / metrics / sampled data	Drop with metrics
User-facing request handler	Shed with 503 + Retry-After
RPC chain across services	Propagate via deadline
Mixed workload	Combination: bounded queue inside, deadline at the edge

The most common mistake is using "block" everywhere. Blocking in a request handler ties up threads waiting for capacity that isn't coming. The right pattern at the edge is shed; the right pattern internally is bounded queue or propagate.

What goes wrong without it

Three production failure modes directly attributable to missing backpressure:

A queue with no bound. Producer faster than consumer for any sustained period. Memory grows. OOM. Postmortem says "the queue should have been bounded".

A bounded queue that drops silently. No metric, no alert. Symptom: customers report missing data weeks later. The queue has been dropping 5% of events for months.

No deadline propagation. Downstream gets slow. Upstream blocks waiting. Threads pile up. Eventually upstream is unreachable too. The "slow downstream" turned into a "service down" because there was no shedding.

Each one is preventable in five lines of code, given that backpressure was considered when designing the data flow.

Backpressure is the question every concurrent system must answer: when supply exceeds capacity, what happens? Pick one of the four strategies (block, drop, shed, propagate) per data flow, document it, and instrument it. The system designs that survive production traffic are the ones where someone made this choice deliberately.

Follow-up questions

▸What happens without backpressure?

Two failure modes. Memory grows unbounded as the queue fills with work the consumer can't process; eventually OOM kills the process. Or work piles up and gets dropped silently when the queue overflows, which looks like a service that just loses messages with no error. Both are the kind of bug that's invisible until production.

▸Block, drop, or shed: how to choose?

Block when the producer has nothing better to do (batch jobs, ETL pipelines). Drop when stale data is worse than missing data (telemetry, sampled metrics, real-time logs). Shed when the client can retry or degrade (HTTP requests, RPC calls). Propagate when the upstream caller has a budget that can be respected (request handlers, gRPC chains).

▸Why is propagation special?

It's the only strategy that handles cross-service overload gracefully. When a service is slow because its downstream is slow, propagating the deadline tells callers to give up too. They redirect, cache, or fail fast. The alternative (everyone blocks waiting on slow downstream) cascades the failure upward.

▸How does backpressure relate to the bulkhead pattern?

Bulkhead is one mechanism that produces backpressure. A bulkhead caps how many concurrent calls can be in flight to a downstream; when it's full, new callers fail fast (load shed). The bulkhead is the implementation, backpressure is the broader concept of 'how the system signals overload'.

What it is

Picking a backpressure strategy turns "slow consumer crashes the system" into "slow consumer slows the producer."

The basic shape:

  Producer -----> Queue -----> Consumer
  (fast)         (filling)    (slow)

  As the queue fills, the system has to decide:
  what happens when it's full?

The four strategies, with pictures

1. BLOCK            Queue full > producer waits.
                    
   Producer --> [████████████]  --> Consumer
                ^ full
   Producer is blocked, Consumer drains, then producer can push again.
   Producer's rate caps at consumer's rate automatically.

2. DROP             Queue full > throw away new (or oldest).
                    
   Producer --> [████████████]  --> Consumer
                       ^ full, drop new item or evict oldest
   New events: silently lost OR oldest events lost.
   Counter goes up: "dropped 1,234 events in last minute."

3. SHED             Queue full > reject incoming with an error.
                    
   Producer --> (503 Service Unavailable, Retry-After: 5s)
                Caller decides: retry, fall back, fail-fast, degrade.

4. PROPAGATE        Pass a deadline through every layer.
                    
   Producer --> [....] --> Consumer (slow)
       ^                       ^
       | "must finish by t+5s" passes through every call
       |
       Each layer checks the deadline. If exceeded, fail fast,
       propagate the timeout upward.

When each one fits

BLOCK     Use when:
            - Producer is a worker (not a request handler)
            - Capping producer rate at consumer rate is acceptable
            - Response latency is not a concern
          Examples: batch ETL, log shipping, background workers

DROP      Use when:
            - The data is fungible (one sample as good as another)
            - Stale is worse than missing
          Examples: metrics, telemetry, sensor readings, log sampling
          ALWAYS instrument the drop count.

SHED      Use when:
            - The code is at the edge of a service
            - Queueing the request buys nothing (it would time out anyway)
            - Caller can retry / fall back
          Examples: HTTP request handlers under load, API gateways

PROPAGATE Use when:
            - The code crosses service boundaries
            - The caller cares about end-to-end latency
            - Timeouts should free up resources cheaply
          Examples: RPC chains, microservice calls

Picking the right one

Match the strategy to the workload:

Scenario	Strategy
Batch / ETL pipeline	Block (bounded queue)
Telemetry / metrics / sampled data	Drop with metrics
User-facing request handler	Shed with 503 + Retry-After
RPC chain across services	Propagate via deadline
Mixed workload	Combination: bounded queue inside, deadline at the edge

What goes wrong without it

Three production failure modes directly attributable to missing backpressure:

A queue with no bound. Producer faster than consumer for any sustained period. Memory grows. OOM. Postmortem says "the queue should have been bounded".

A bounded queue that drops silently. No metric, no alert. Symptom: customers report missing data weeks later. The queue has been dropping 5% of events for months.

Each one is preventable in five lines of code, given that backpressure was considered when designing the data flow.

Follow-up questions

▸What happens without backpressure?

▸Block, drop, or shed: how to choose?

▸Why is propagation special?

▸How does backpressure relate to the bulkhead pattern?

What it is

The four strategies, with pictures

When each one fits

Picking the right one

What goes wrong without it

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Backpressure: Strategies and Signals

What it is

The four strategies, with pictures

When each one fits

Picking the right one

What goes wrong without it

Implementations

Key points

Follow-up questions

Gotchas

Related reading