Amdahl's Law

What it is

Amdahl's Law (Gene Amdahl, 1967) gives the maximum speedup achievable by parallelising a workload. The formula:

Speedup(N) = 1 / (s + (1 - s) / N)

Where s is the serial fraction (the part that can't be parallelised) and N is the number of processors.

The shocking implication: even a small serial fraction caps speedup. A workload with 5% serial code can never run more than 20x faster, no matter how many cores are thrown at it. As N → ∞, speedup → 1/s.

Why it matters

It's a corrective to the intuition "twice the cores = twice the speed". That's true for the parallel part, but the serial part doesn't get faster.

Practical numbers:

Serial fraction	Max speedup
50%	2x
10%	10x
5%	20x
1%	100x
0.1%	1000x

Massive speedup requires very small serial fractions. Real workloads rarely have less than 1-5% serial overhead (setup, aggregation, I/O coordination), so 20-100x is a typical practical ceiling for parallelisation.

What is "serial" in real systems

Common serial fractions:

Setup and teardown: initial config load, final result aggregation, opening/closing connections.
Lock contention: even brief lock holds show up as serial when many threads compete.
Coordinated I/O: single-threaded log writer, sequential database flush, shared cache invalidation.
Result merging: parallel computation followed by sequential combine (the "fan-in" half of fan-out/fan-in).
Inherently sequential algorithms: dynamic programming with chain dependencies, sequential decoders.

Profile the workload at varying core counts. If speedup is sub-linear, the gap is the serial fraction.

How to fight Amdahl

Two strategies.

Reduce the serial fraction. Often the highest-leverage move. Replace a sequential aggregator with a tree-reduce. Replace a global lock with sharded locks. Move single-threaded I/O to a per-thread design. Each shaves the serial part, raising the asymptote.

Re-architect to remove the serial part. Some serial fractions are removable; others are inherent. If the workload genuinely requires a final sequential step (writing the result to a single sink), it can't be parallelised. But often, multiple sinks, partitioning the output, or eliminating the synchronisation will work.

Amdahl vs USL

Amdahl's Law is the simple upper bound. The Universal Scalability Law (USL, Neil Gunther) is more realistic for real systems:

Speedup(N) = N / (1 + α(N - 1) + β·N(N - 1))

Where α is the contention coefficient and β is the coherency coefficient. The β term lets USL model the case where adding cores actually slows the system down (lock contention overhead, cache coherency traffic). Real systems often peak at some N and degrade beyond it; USL captures that, Amdahl doesn't.

For most performance reasoning, Amdahl is enough. For capacity planning of large systems, USL is more accurate.

The right sequence when "we need more performance" comes up

Adding cores is rarely the right first move. Measure at varying parallelism levels and compute observed speedup. Compare it to Amdahl's prediction at the suspected serial fraction. Profile to find the serial parts (lock contention, I/O serialisation). Reduce the serial fraction by re-architecting those hotspots. Only then add cores, knowing the new ceiling.

Skipping straight to "add cores" wastes money. Cores are the easy lever; the serial fraction is the actual bottleneck.

Follow-up questions

▸What's the practical lesson?

Throwing cores at a problem has diminishing returns. To break the limit, reduce the serial fraction. Profile to find what's serial: a single-threaded log writer, a global lock, a sequential I/O to a shared backend. Eliminate or parallelise it; only then does adding cores keep helping.

▸How does Amdahl differ from USL?

Amdahl assumes parallel portions scale perfectly. USL (Universal Scalability Law) adds a contention term and a coherency term, modeling cases where adding cores actually slows the system down (cache-line bouncing, lock contention overhead). USL fits real systems better; Amdahl is the simpler upper bound.

▸Does Amdahl apply to distributed systems?

The same shape. The 'serial' part might be a coordinator, a leader, a global lock, a single-shard query. Most scaling problems in distributed systems are Amdahl-shaped: 'this part of the workflow can't be parallelised, and it caps the throughput'.

▸Why don't speedups in benchmarks match Amdahl predictions?

Several reasons. Benchmark workload may not match production (different serial fraction). Cache effects help on some sizes (more cores = more cache total) and hurt on others (false sharing). System overhead (scheduler, GC) is non-trivial. Amdahl is the upper bound; real speedup is at or below it.

What it is

Amdahl's Law (Gene Amdahl, 1967) gives the maximum speedup achievable by parallelising a workload. The formula:

Speedup(N) = 1 / (s + (1 - s) / N)

Where s is the serial fraction (the part that can't be parallelised) and N is the number of processors.

Why it matters

It's a corrective to the intuition "twice the cores = twice the speed". That's true for the parallel part, but the serial part doesn't get faster.

Practical numbers:

Serial fraction	Max speedup
50%	2x
10%	10x
5%	20x
1%	100x
0.1%	1000x

What is "serial" in real systems

Common serial fractions:

Setup and teardown: initial config load, final result aggregation, opening/closing connections.
Lock contention: even brief lock holds show up as serial when many threads compete.
Coordinated I/O: single-threaded log writer, sequential database flush, shared cache invalidation.
Result merging: parallel computation followed by sequential combine (the "fan-in" half of fan-out/fan-in).
Inherently sequential algorithms: dynamic programming with chain dependencies, sequential decoders.

Profile the workload at varying core counts. If speedup is sub-linear, the gap is the serial fraction.

How to fight Amdahl

Two strategies.

Amdahl vs USL

Amdahl's Law is the simple upper bound. The Universal Scalability Law (USL, Neil Gunther) is more realistic for real systems:

Speedup(N) = N / (1 + α(N - 1) + β·N(N - 1))

For most performance reasoning, Amdahl is enough. For capacity planning of large systems, USL is more accurate.

The right sequence when "we need more performance" comes up

Skipping straight to "add cores" wastes money. Cores are the easy lever; the serial fraction is the actual bottleneck.

Follow-up questions

▸What's the practical lesson?

▸How does Amdahl differ from USL?

▸Does Amdahl apply to distributed systems?

▸Why don't speedups in benchmarks match Amdahl predictions?

What it is

Why it matters

What is "serial" in real systems

How to fight Amdahl

Amdahl vs USL

The right sequence when "we need more performance" comes up

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Amdahl's Law

What it is

Why it matters

What is "serial" in real systems

How to fight Amdahl

Amdahl vs USL

The right sequence when "we need more performance" comes up

Implementations

Key points

Follow-up questions

Gotchas

Related reading