Thundering Herd & Retry Storms

When Retries Turn a Small Problem Into a Big One

Retries are the most deceptively dangerous pattern in distributed systems. On the surface, it seems like a no-brainer: if a request fails, just try again. But when you have thousands of clients all following that same logic, you get a feedback loop that can turn a minor blip into a full-blown outage.

The Thundering Herd

Picture a cache server that holds the product catalog for an e-commerce site. It restarts for a routine update, and for 30 seconds there are zero cached entries. Every single request from every user goes straight to the database. If you normally serve 10,000 requests per second from cache with a 1% miss rate, that is 100 database queries per second. During the cold cache window, that number jumps to 10,000 queries per second. That is a 100x spike, and no database is sized to handle it.

The standard fix is cache stampede protection. When the first request finds a cache miss, it grabs a lock and fetches from the database. Every subsequent request for that same key waits for the first one to fill the cache instead of all of them hammering the database independently. This approach (sometimes called "request coalescing" or "single-flight") cuts the database load from N concurrent requests down to exactly one.

How Retry Storms Snowball

Retry storms are worse than thundering herds because they reinforce themselves. Here is how the cycle works: the backend gets slow under load. Clients time out and retry. The retries add more load. The backend gets even slower. More clients time out. More retries fire. Each round amplifies the previous one.

Without somebody stepping in, this grows exponentially. If each client retries 3 times and you have 1,000 clients, one second of slowness produces 3,000 extra requests on top of the original 1,000. Those 4,000 requests cause more timeouts, generating another 12,000 retries. Within minutes, the backend is getting hit with 10-50x its normal load, and almost all of it is retries.

Exponential Backoff With Jitter

The standard answer is exponential backoff: wait 1 second, then 2, then 4 between retries. But there is a subtle catch. If 1,000 clients all started retrying at the same moment, they all compute the same backoff intervals and retry in synchronized waves. Random jitter breaks that synchronization. Instead of all clients retrying at T+1s, they spread out between T+0.5s and T+1.5s, distributing the load more evenly.

The formula looks like: delay = min(cap, base * 2^attempt) * random(0.5, 1.5). AWS recommends "full jitter" where delay = random(0, min(cap, base * 2^attempt)), which gives even better distribution across clients.

Retry Budgets

Even with backoff and jitter, individual clients making their own retry decisions independently can still overwhelm a backend. Retry budgets address this at the system level by capping the ratio of retries to original requests. Google's SRE practices suggest a retry budget of 10%: if more than 10% of requests in a given time window are retries, stop retrying entirely. This prevents retry storms from forming no matter what individual clients are doing.

When Retries Turn a Small Problem Into a Big One

The Thundering Herd

How Retry Storms Snowball

Exponential Backoff With Jitter

Retry Budgets

When Retries Turn a Small Problem Into a Big One

The Thundering Herd

How Retry Storms Snowball

Exponential Backoff With Jitter

Retry Budgets

Incident Timeline

Detection Signals

Prevention

Key Points

Common Mistakes

Related Topics

Thundering Herd & Retry Storms

When Retries Turn a Small Problem Into a Big One

The Thundering Herd

How Retry Storms Snowball

Exponential Backoff With Jitter

Retry Budgets

Incident Timeline

Detection Signals

Prevention

Key Points

Common Mistakes

Related Topics