Async vs Blocking I/O

Two ways to handle waiting

Every concurrent system has to answer one question: when a thread is waiting for the network or for disk, what does it do in the meantime? There are two main answers, and most of the design decisions in modern servers come from picking between them.

Blocking I/O: one thread per connection

In the blocking model, every open connection has its own thread. When that thread calls read on a socket, the kernel notices there is no data yet, marks the thread as "asleep", and runs some other thread on the CPU. When data finally arrives, the kernel wakes the thread up and the read call returns. From the thread's point of view, the read just took a while.

The operating system is doing the work of switching between connections. The code looks like normal synchronous code: call, wait, get a value. The price is one thread per connection, and threads are not cheap.

Async I/O: one thread, an event loop, many connections

In the async model, a single thread runs a small loop. On each turn it asks the kernel a question: "out of all these connections, which ones have data ready right now?" The kernel answers with a list (this is what epoll_wait does on Linux, kqueue on macOS, IOCP on Windows). The loop then runs the handler for each ready connection, in turn, on the same thread. Handlers are expected to do their bit of work and return quickly so the loop can move on.

There are no extra threads parked in the kernel waiting for data. There is one thread, busy when there is work, idle when there is not. Per-connection cost drops from "one stack" to "a small bit of state for the handler".

The two models pick the same job ("decide which connection to work on next") and just put the picker in different places. In blocking, the OS picks. In async, the event loop picks.

When async wins

Async wins for workloads with many connections, each doing a little bit of work. Think web servers, proxies, real-time chat, push notifications, gateways, anything that holds a lot of open sockets but processes only a small request on each.

The arithmetic explains why. A platform (OS) thread reserves stack space when it is created. On 64-bit Linux the default with glibc pthreads is 8 MB of virtual address space per thread (mostly not committed to physical memory until the stack is actually used). On the JVM the default is more like 512 KB to 1 MB. Either way, ten thousand threads is a lot of address space and a lot of context switching for the kernel. With async, ten thousand connections sit on one thread; the per-connection cost is a few hundred bytes of coroutine state.

The visible result at high concurrency: more requests per second per server, and more predictable tail latency under load (no thread-creation spike, no context-switch storms).

When blocking wins

Three cases where blocking is the right answer.

Low concurrency. A command-line tool that makes one HTTP call. A cron job that processes a hundred files in sequence. An admin endpoint that sees one request a minute. There is no thread-count problem to solve, so async only adds complexity.

Heavy CPU per request. Image resizing, ML inference, encoding, large data transformations. The I/O is small compared to the compute. Async does not help when the bottleneck is the CPU, and you need real threads or processes anyway to use multiple cores.

Language and library ecosystem. Python's asyncio is mature, but a lot of widely used libraries are sync-only. If you have to keep dropping back into a thread to use them, you have all the cost of async with little of the benefit. For one-off scripts, sync code is just easier.

The 'colored function' problem

In Python and JavaScript, async functions and regular functions are not interchangeable. An async function returns a coroutine; you can only call it usefully from another async function with await. Sync code cannot just call async code and use the result.

This effect spreads upward through a codebase. The HTTP handler is async, so the framework has to be async, so the database driver has to be async, so the helper that calls the database has to be async, and so on. Codebases that mix the two are painful to live in; you end up writing bridge code that runs an event loop just to call into a sync helper, or runs a thread just to call into an async helper.

This is sometimes called the "what color is your function?" problem, after a well-known blog post. It is the main reason teams hesitate before adopting async/await in a language that already has lots of sync code.

Virtual threads and goroutines: the third option

Java 21 added virtual threads. Go has had goroutines from the start. Both blur the line between blocking and async.

The code reads like blocking code. You call a function, it blocks, it returns a value. Behind the scenes, the runtime is doing what an event loop does. When a virtual thread or goroutine hits a blocking I/O call, the runtime quietly parks it and runs a different one on the same OS thread. When the I/O finishes, the runtime wakes it up. Many virtual threads or goroutines share a small pool of real OS threads.

The benefit: blocking-style code with the throughput of async. No colored functions, no callbacks, no await chains, no event loop in your stack traces. Just code that reads naturally.

This is the direction the industry is moving in. Python and JavaScript may eventually get something similar; today they still have async/await with all the function-coloring pain that goes with it.

Picking a model today

For a new service, the choice mostly depends on the language:

Java. Virtual threads, Java 21 or newer. Blocking-style code, async-class throughput. Default choice for new services.
Go. Goroutines. The same idea, more mature, no separate "virtual" concept because every goroutine is one.
Python. async/await with FastAPI, aiohttp, and asyncio for I/O-heavy services. Threads for sync libraries or CPU-bound work. Free-threaded Python (the GIL-removal effort, PEP 703) is still maturing as a third option.
JavaScript and Node. async/await is the only option because the runtime is single-threaded. Easier than Python in practice because almost the entire ecosystem is async.
Rust. tokio plus async/await. Fast and mature, but has its own version of the colored-function problem.

For an existing service, switching from one model to the other is usually expensive. Pick the right one up front.

Two ways to handle waiting

Blocking I/O: one thread per connection

Async I/O: one thread, an event loop, many connections

The two models pick the same job ("decide which connection to work on next") and just put the picker in different places. In blocking, the OS picks. In async, the event loop picks.

When async wins

The visible result at high concurrency: more requests per second per server, and more predictable tail latency under load (no thread-creation spike, no context-switch storms).

When blocking wins

Three cases where blocking is the right answer.

The 'colored function' problem

Virtual threads and goroutines: the third option

Java 21 added virtual threads. Go has had goroutines from the start. Both blur the line between blocking and async.

The benefit: blocking-style code with the throughput of async. No colored functions, no callbacks, no await chains, no event loop in your stack traces. Just code that reads naturally.

This is the direction the industry is moving in. Python and JavaScript may eventually get something similar; today they still have async/await with all the function-coloring pain that goes with it.

Picking a model today

For a new service, the choice mostly depends on the language:

Java. Virtual threads, Java 21 or newer. Blocking-style code, async-class throughput. Default choice for new services.
Go. Goroutines. The same idea, more mature, no separate "virtual" concept because every goroutine is one.
Python. async/await with FastAPI, aiohttp, and asyncio for I/O-heavy services. Threads for sync libraries or CPU-bound work. Free-threaded Python (the GIL-removal effort, PEP 703) is still maturing as a third option.
JavaScript and Node. async/await is the only option because the runtime is single-threaded. Easier than Python in practice because almost the entire ecosystem is async.
Rust. tokio plus async/await. Fast and mature, but has its own version of the colored-function problem.

For an existing service, switching from one model to the other is usually expensive. Pick the right one up front.

Two ways to handle waiting

Blocking I/O: one thread per connection

Async I/O: one thread, an event loop, many connections

When async wins

When blocking wins

The 'colored function' problem

Virtual threads and goroutines: the third option

Picking a model today

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Async vs Blocking I/O

Two ways to handle waiting

Blocking I/O: one thread per connection

Async I/O: one thread, an event loop, many connections

When async wins

When blocking wins

The 'colored function' problem

Virtual threads and goroutines: the third option

Picking a model today

Implementations

Key points

Follow-up questions

Gotchas

Related reading