Async vs Blocking I/O
Blocking I/O: one thread per concurrent operation, thread sleeps during I/O wait. Async I/O: one thread handles many operations via an event loop, syscalls are non-blocking with callbacks/coroutines. Async wins for many concurrent connections; blocking wins for few but heavy ones, or when the language doesn't have great async support.
Two ways to handle waiting
Every concurrent system has to answer one question: when a thread is waiting for the network or for disk, what does it do in the meantime? There are two main answers, and most of the design decisions in modern servers come from picking between them.
Blocking I/O: one thread per connection
In the blocking model, every open connection has its own thread. When that thread calls read on a socket, the kernel notices there is no data yet, marks the thread as "asleep", and runs some other thread on the CPU. When data finally arrives, the kernel wakes the thread up and the read call returns. From the thread's point of view, the read just took a while.
The operating system is doing the work of switching between connections. The code looks like normal synchronous code: call, wait, get a value. The price is one thread per connection, and threads are not cheap.
Async I/O: one thread, an event loop, many connections
In the async model, a single thread runs a small loop. On each turn it asks the kernel a question: "out of all these connections, which ones have data ready right now?" The kernel answers with a list (this is what epoll_wait does on Linux, kqueue on macOS, IOCP on Windows). The loop then runs the handler for each ready connection, in turn, on the same thread. Handlers are expected to do their bit of work and return quickly so the loop can move on.
There are no extra threads parked in the kernel waiting for data. There is one thread, busy when there is work, idle when there is not. Per-connection cost drops from "one stack" to "a small bit of state for the handler".
The two models pick the same job ("decide which connection to work on next") and just put the picker in different places. In blocking, the OS picks. In async, the event loop picks.
When async wins
Async wins for workloads with many connections, each doing a little bit of work. Think web servers, proxies, real-time chat, push notifications, gateways, anything that holds a lot of open sockets but processes only a small request on each.
The arithmetic explains why. A platform (OS) thread reserves stack space when it is created. On 64-bit Linux the default with glibc pthreads is 8 MB of virtual address space per thread (mostly not committed to physical memory until the stack is actually used). On the JVM the default is more like 512 KB to 1 MB. Either way, ten thousand threads is a lot of address space and a lot of context switching for the kernel. With async, ten thousand connections sit on one thread; the per-connection cost is a few hundred bytes of coroutine state.
The visible result at high concurrency: more requests per second per server, and more predictable tail latency under load (no thread-creation spike, no context-switch storms).
When blocking wins
Three cases where blocking is the right answer.
Low concurrency. A command-line tool that makes one HTTP call. A cron job that processes a hundred files in sequence. An admin endpoint that sees one request a minute. There is no thread-count problem to solve, so async only adds complexity.
Heavy CPU per request. Image resizing, ML inference, encoding, large data transformations. The I/O is small compared to the compute. Async does not help when the bottleneck is the CPU, and you need real threads or processes anyway to use multiple cores.
Language and library ecosystem. Python's asyncio is mature, but a lot of widely used libraries are sync-only. If you have to keep dropping back into a thread to use them, you have all the cost of async with little of the benefit. For one-off scripts, sync code is just easier.
The 'colored function' problem
In Python and JavaScript, async functions and regular functions are not interchangeable. An async function returns a coroutine; you can only call it usefully from another async function with await. Sync code cannot just call async code and use the result.
This effect spreads upward through a codebase. The HTTP handler is async, so the framework has to be async, so the database driver has to be async, so the helper that calls the database has to be async, and so on. Codebases that mix the two are painful to live in; you end up writing bridge code that runs an event loop just to call into a sync helper, or runs a thread just to call into an async helper.
This is sometimes called the "what color is your function?" problem, after a well-known blog post. It is the main reason teams hesitate before adopting async/await in a language that already has lots of sync code.
Virtual threads and goroutines: the third option
Java 21 added virtual threads. Go has had goroutines from the start. Both blur the line between blocking and async.
The code reads like blocking code. You call a function, it blocks, it returns a value. Behind the scenes, the runtime is doing what an event loop does. When a virtual thread or goroutine hits a blocking I/O call, the runtime quietly parks it and runs a different one on the same OS thread. When the I/O finishes, the runtime wakes it up. Many virtual threads or goroutines share a small pool of real OS threads.
The benefit: blocking-style code with the throughput of async. No colored functions, no callbacks, no await chains, no event loop in your stack traces. Just code that reads naturally.
This is the direction the industry is moving in. Python and JavaScript may eventually get something similar; today they still have async/await with all the function-coloring pain that goes with it.
Picking a model today
For a new service, the choice mostly depends on the language:
- Java. Virtual threads, Java 21 or newer. Blocking-style code, async-class throughput. Default choice for new services.
- Go. Goroutines. The same idea, more mature, no separate "virtual" concept because every goroutine is one.
- Python.
async/awaitwith FastAPI, aiohttp, andasynciofor I/O-heavy services. Threads for sync libraries or CPU-bound work. Free-threaded Python (the GIL-removal effort, PEP 703) is still maturing as a third option. - JavaScript and Node.
async/awaitis the only option because the runtime is single-threaded. Easier than Python in practice because almost the entire ecosystem is async. - Rust.
tokioplusasync/await. Fast and mature, but has its own version of the colored-function problem.
For an existing service, switching from one model to the other is usually expensive. Pick the right one up front.
Implementations
Java 21 introduced virtual threads: lightweight threads scheduled by the JVM, not the OS. The code is blocking-style (executor.submit + blocking I/O); the JVM unmounts the virtual thread during I/O so other virtual threads can run on the same OS thread. Same code, async-like efficiency.
1 import java.util.concurrent.Executors;
2
3 // Old: platform threads, expensive (1 MB stack each)
4 try (var pool = Executors.newFixedThreadPool(200)) {
5 for (var url : urls) {
6 pool.submit(() -> fetch(url));
7 }
8 }
9
10 // New: virtual threads, cheap (~few KB per thread)
11 try (var pool = Executors.newVirtualThreadPerTaskExecutor()) {
12 for (var url : urls) {
13 pool.submit(() -> fetch(url)); // blocking call is fine
14 }
15 }
16
17 // Same code structure, but now 10,000+ concurrent virtual threads are feasibleKey points
- •Blocking model: thread per request. Each thread has memory cost (pthreads default ~8 MB virtual / a few hundred KB committed; JVM default 512KB-1MB), thread-create cost, context switch cost.
- •Async model: event loop reads ready I/O events, dispatches to coroutines. One thread can multiplex thousands of connections.
- •Async needs the entire stack to be async: blocking call inside an async function freezes the loop.
- •Choose based on workload: many connections + small per-connection work = async. Few connections + heavy work = threads.
- •Virtual threads (Java 21+, Go goroutines) blur the line: write blocking-style code, runtime provides async-like efficiency.
Follow-up questions
▸Why is async usually faster for I/O?
▸When is blocking actually fine?
▸Are virtual threads (Java) and goroutines (Go) async or blocking?
▸What is the 'colored function' problem in async languages?
Gotchas
- !Calling blocking code from async freezes the event loop
- !Mixing thread-based libraries (threading.Lock) with asyncio races horribly
- !Per-thread memory cost adds up: 10K threads = ~10 GB just for stack address space (mostly virtual)
- !Virtual threads inherit thread-locals, but stack traces look different from platform threads
- !Async stack traces are harder to read (the event loop is in the middle)