Concurrency vs Parallelism
Concurrency is structuring a program to deal with many things at once; parallelism is actually doing many things at the same time. One can exist without the other.
Diagram
Two words, two different things
CONCURRENCY PARALLELISM
----------- -----------
About STRUCTURE About EXECUTION
"How my program is organized" "What's actually running right now"
Tasks can take turns on 1 core Tasks really run at the same instant
on different hardware
A program can be:
- Concurrent on one core (tasks take turns; asyncio, Python with the GIL).
- Parallel without much concurrency (a SIMD matrix multiply).
- Both at once (a multi-threaded web server on multiple cores).
A picture
In each picture below, read each vertical column top to bottom: that column shows what is running on every core at that wall-clock instant.
ONE CORE: Concurrency only
Wall clock -------------------------------------->
Task A: ███ ░░░ ░░░ ███ ░░░ ░░░ ███ ░░░ ░░░
Task B: ░░░ ███ ░░░ ░░░ ███ ░░░ ░░░ ███ ░░░
Task C: ░░░ ░░░ ███ ░░░ ░░░ ███ ░░░ ░░░ ███
███ = running on the core
░░░ = waiting its turn
At every column, exactly ONE task has ███. The OS rotates the
three tasks through the single core. The user feels concurrency,
but only 1 task is actually executing at any wall-clock instant.
FOUR CORES: Concurrency + Parallelism
Wall clock -------------------------------------->
Core 0: ███ ███ ███ ███ ███ ███ ███ ███ ███ (Task A)
Core 1: ███ ███ ███ ███ ███ ███ ███ ███ ███ (Task B)
Core 2: ███ ███ ███ ███ ███ ███ ███ ███ ███ (Task C)
Core 3: ███ ███ ███ ███ ███ ███ ███ ███ ███ (Task D)
At every column, FOUR tasks have ███. They really do run at
the same wall-clock instant on different physical cores.
That's parallelism.
The kitchen analogy: one chef switching between three dishes is concurrent. Three chefs each cooking one dish is parallel. Three chefs each rotating across three dishes is both.
Why people confuse them and pay for it
Most "make this faster with threads" instincts are wrong because the developer mixed up the two. Adding threads to Python CPU code makes it slower (the GIL serializes execution; the result is concurrency overhead with no parallel speedup). Spinning up 10,000 goroutines for CPU-bound work doesn't help past the core count. async/await provides concurrency but zero parallel speedup.
Rob Pike's line "Concurrency is not parallelism, but it enables it." Concurrency is the design: independent tasks, well-defined coordination. Parallelism is what emerges when the runtime maps those tasks onto multiple cores.
The single test
Ask: "At any given instant, how many tasks are executing CPU instructions?"
- One = concurrency only.
- N = parallelism. (And if the code is structured for it, concurrency is present too.)
When to reach for what
| Workload | Right choice |
|---|---|
| 10K I/O-bound connections, low CPU | Concurrency (asyncio, goroutines, virtual threads) |
| Bulk numerical compute, single dataset | Parallelism (multiprocessing, ForkJoinPool, GPU) |
| Web server: many requests, each does CPU + I/O | Both (threads/goroutines on multi-core) |
| Stream processing pipeline | Both, with explicit stages |
The trap
"I'll use threads to make this faster." If the work is CPU-bound in Python, threads make it slower, the GIL forces serialization while context-switch cost adds up. Reach for multiprocessing instead, or rewrite the hot path.
How this changes the code
- Concurrency forces designs around coordination: shared state, channels, locks, cancellation.
- Parallelism forces designs around data dependencies: what must happen before what; can the work be split into independent shards.
The two design conversations are different. A good interviewer probes both, "how do these tasks share state?" (concurrency) and "what's the speedup with more cores?" (parallelism).
Key points
- •Concurrency = composition of independently executing tasks (a structure)
- •Parallelism = simultaneous execution of multiple tasks (an outcome)
- •Single-core CPU + threads = concurrency without parallelism (interleaved)
- •GPU compute or multi-core SIMD = parallelism without classical concurrency
- •Web server handling 10K connections = concurrency; matrix multiplication on 8 cores = parallelism
- •Async/event loops give concurrency without threads, single-threaded but interleaved
Tradeoffs
| Option | Pros | Cons | When to use |
|---|---|---|---|
| Concurrency without parallelism (asyncio, single-threaded event loop) |
|
| I/O-bound workloads, web servers, message brokers, network proxies |
| Parallelism without concurrency (data-parallel) |
|
| CPU-bound batch jobs, image processing, numerical compute, bulk transforms |
| Concurrency with parallelism (threads/goroutines on multi-core) |
|
| Most production servers, handle many requests, use all cores |
Follow-up questions
▸If Python's GIL prevents parallel threads, why use threading at all?
▸Is async/await concurrent, parallel, or both?
▸Can a single-threaded program be concurrent?
▸Why does Rob Pike say 'concurrency is not parallelism'?
Gotchas
- !More threads != more parallelism, once thread count exceeds core count, the cost is context-switching without throughput gain
- !asyncio is single-threaded, a CPU-bound task in an async function blocks the entire event loop
- !Thread pools sized for I/O are wrong for CPU-bound work and vice versa
Common pitfalls
- Adding threads to a CPU-bound Python program, the GIL serializes them and performance gets worse
- Spinning up goroutines in a tight loop without bounded concurrency, context-switch storm
Every production system makes this distinction. nginx is concurrent + parallel (worker processes per core, async I/O within each). Redis is concurrent but mostly single-threaded (single event loop, separate threads only for I/O syscalls).