Core Mental ModelsTopic 1 of 19

ConceptBasicAsked Often

Process vs Thread vs Goroutine

In one line

A process owns memory and resources; threads share the process's address space and are scheduled by the OS; goroutines are user-space tasks multiplexed onto a small pool of OS threads by the Go runtime.

Diagram

Three units, three cost profiles

The diagram above shows the containment chain: a process holds threads; a thread can run many goroutines or virtual threads. The cost picture in numbers:

	Process	Thread	Goroutine / Virtual Thread
Memory per	~10 MB	~1 MB	~2 KB
Spawn cost	~ms	~us	~us
Switch cost	~10 us	~5 us	~200 ns
Memory isolation	full	shares with siblings	shares with siblings
Max practical	thousands	~10K	millions

The cost difference is the whole story. A web server that spawns one OS thread per request caps out around 10K connections. The same server using goroutines or virtual threads handles a million on the same hardware.

Why it matters

Picking the wrong unit costs orders of magnitude:

A Python script using threads for CPU-bound work runs slower than the single-threaded version (GIL serialises bytecode).
A Java service that wraps every request in a platform thread runs out of memory at ~10K concurrent requests.
A Go service that spawns goroutines unboundedly under load can OOM (cheap to spawn ≠ free; the rate must still be bounded).

Interviewers ask about this because the answer reveals whether the candidate understands the runtime in use or just memorised API names.

The M:N trick

Goroutines and virtual threads use M:N scheduling: M user-space tasks run on top of N OS threads. Whenever a task blocks (waiting on a channel, a lock, a syscall), the runtime parks it and reuses the OS thread to run another task. When the blocked task becomes runnable again, the runtime picks any available OS thread to resume it.

Without M:N (thread-per-request):       With M:N (goroutine-per-request):

  10K requests = 10K OS threads          10K requests = 10K goroutines
                                                       on ~8 OS threads
  ~10 GB of RAM just for stacks          ~20 MB of RAM
  Most threads parked in kernel          Goroutines parked in user space
  Kernel scheduler thrashes              Runtime keeps cores busy

That's why "thread per request" is back in fashion via virtual threads (Java 21+). The model is the same; the cost is finally manageable.

Python is different

The CPython interpreter has a Global Interpreter Lock. Only one Python thread runs Python code at a time, regardless of core count. Threads still help for I/O (the GIL is released during blocking syscalls, NumPy ops, etc.), but CPU-bound parallelism requires separate processes (multiprocessing) or a rewrite of the hot path in C/Rust. asyncio is the third option, single-threaded cooperative concurrency.

When to reach for what

CPU-bound work, Java/Go: pool of OS threads / goroutines sized to CPU count.
I/O-bound work, Java: virtual threads (or async for older Java).
I/O-bound work, Python: asyncio for high concurrency, threading for moderate.
CPU-bound work, Python: multiprocessing or rewrite hot path in C/Rust.
Hard isolation (crash containment, security boundary): separate processes.

Warning

The most expensive bug A leaked goroutine or thread doesn't crash the program, it slowly drains memory and file descriptors until something else does. Always know how every spawned concurrent task will exit.

Primitives by language

java.lang.Thread (platform thread, ~1 MB stack)
Thread.ofVirtual() (Java 21+, M:N scheduled)
Runnable / Callable
Executors.newVirtualThreadPerTaskExecutor()

Implementations

Platform thread vs Virtual thread

A platform thread is a thin wrapper around an OS thread (~1 MB stack, kernel-scheduled). A virtual thread is a JVM-managed user-space thread parked on a small pool of carrier threads, same API, vastly cheaper. Spawning a million is realistic.

 1  // Platform thread, backed by an OS thread
 2  Thread platform = new Thread(() -> {
 3      System.out.println("Platform: " + Thread.currentThread());
 4  });
 5  platform.start();
 6  platform.join();
 7  
 8  // Virtual thread, JVM-managed (Java 21+)
 9  Thread virtual = Thread.ofVirtual().start(() -> {
10      System.out.println("Virtual: " + Thread.currentThread());
11  });
12  virtual.join();

One million virtual threads

newVirtualThreadPerTaskExecutor() makes the request-per-thread model viable again. Each task gets its own virtual thread; blocking calls park the virtual thread without blocking a carrier.

1  try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
2      IntStream.range(0, 1_000_000).forEach(i ->
3          executor.submit(() -> {
4              Thread.sleep(Duration.ofSeconds(1));
5              return i;
6          })
7      );
8  } // auto-closed: waits for all tasks to finish

Key points

•Process: isolated address space, expensive context switch (~tens of μs), IPC required to share data
•OS thread: ~1 MB stack, context switch ~1–10 μs, kernel-scheduled
•Goroutine: ~2 KB initial stack (grows), context switch ~hundreds of ns, runtime-scheduled
•Java virtual threads (21+): JVM-managed, M:N model, millions per JVM
•Python threads share the GIL, only one executes Python bytecode at a time
•Prefer processes for CPU-bound parallelism in Python; threads for I/O

Tradeoffs

Option	Pros	Cons	When to use
OS Thread (Java platform / Python threading.Thread)	Simple mental model Direct OS scheduling Good for CPU-bound work in Java	~1 MB stack each Slow context switches GIL caps parallelism in CPython	CPU-bound work in Java; I/O-bound work in Python
Virtual Thread (Java 21+)	Cheap (~few KB) Millions per JVM Block freely without cost	Pinning on synchronized blocks Newer, fewer tuned libraries	I/O-bound, high concurrency, request-per-thread servers
Goroutine	~2 KB initial stack Fast scheduling First-class with channels	Easy to leak No goroutine-local storage (use context)	Default unit of concurrency in Go
asyncio.Task / Process	asyncio: 100K+ concurrent I/O on one thread Process: bypasses GIL	asyncio: single-threaded, blocking calls poison the loop Process: heavy, IPC overhead	asyncio for I/O-heavy services; multiprocessing for CPU-bound Python

Follow-up questions

▸How many goroutines can a Go program run?

Practically millions on modern hardware. Each starts with ~2 KB stack that grows as needed. Limited by available memory, not OS thread count.

▸What's the GIL and why does it matter?

Global Interpreter Lock, CPython serializes execution of Python bytecode across threads. Threads help for I/O (GIL released around system calls) but not CPU. Use multiprocessing for CPU parallelism.

▸Difference between virtual threads and goroutines?

Both use M:N scheduling onto OS threads. Goroutines are runtime-native and integrate with channels. Virtual threads are JVM-managed and work with existing blocking JDK APIs without rewrites.

▸Why does Go limit OS threads but not goroutines?

GOMAXPROCS sets the number of OS threads (P's) executing Go code simultaneously. Goroutines are multiplexed onto these. Default = number of CPU cores.

▸Can a goroutine outlive the function that started it?

Yes, a goroutine runs until its function returns. If main() exits, all goroutines are killed. Otherwise it can run forever, which is how leaks happen.

Gotchas

!Goroutines started in a loop without context cancellation are the #1 source of leaks in production Go services
!Virtual threads pin to their carrier inside synchronized blocks, prefer ReentrantLock for I/O-heavy code
!Python threads do NOT give CPU parallelism, measure and switch to multiprocessing if compute-bound
!Mixing asyncio and threading is a footgun, use asyncio.to_thread() for blocking calls
!runtime.NumGoroutine() growing unbounded over time = leak; alert on it

Common pitfalls

Confusing concurrency (multiple tasks in progress) with parallelism (multiple tasks running simultaneously)
Assuming Python threads parallelize CPU work, they don't, the GIL prevents it
Using multiprocessing for I/O, process overhead dwarfs the I/O wait

APIs worth memorising

Java: Thread.ofVirtual(), Executors.newVirtualThreadPerTaskExecutor()
Python: threading.Thread, multiprocessing.Process, asyncio.create_task(), concurrent.futures
Go: go keyword, runtime.GOMAXPROCS, runtime.NumGoroutine, context.Context

Where this shows up

Every modern Java/Go service. Spring Boot 3.2+ supports virtual threads via spring.threads.virtual.enabled=true (opt-in, not default). Go's net/http spawns a goroutine per connection. Python web servers (FastAPI on uvicorn, Django on gunicorn-async) use asyncio for I/O concurrency.

Process vs Thread vs Goroutine

In one line

Diagram

Three units, three cost profiles

The diagram above shows the containment chain: a process holds threads; a thread can run many goroutines or virtual threads. The cost picture in numbers:

	Process	Thread	Goroutine / Virtual Thread
Memory per	~10 MB	~1 MB	~2 KB
Spawn cost	~ms	~us	~us
Switch cost	~10 us	~5 us	~200 ns
Memory isolation	full	shares with siblings	shares with siblings
Max practical	thousands	~10K	millions

Why it matters

Picking the wrong unit costs orders of magnitude:

A Python script using threads for CPU-bound work runs slower than the single-threaded version (GIL serialises bytecode).
A Java service that wraps every request in a platform thread runs out of memory at ~10K concurrent requests.
A Go service that spawns goroutines unboundedly under load can OOM (cheap to spawn ≠ free; the rate must still be bounded).

Interviewers ask about this because the answer reveals whether the candidate understands the runtime in use or just memorised API names.

The M:N trick

Without M:N (thread-per-request):       With M:N (goroutine-per-request):

  10K requests = 10K OS threads          10K requests = 10K goroutines
                                                       on ~8 OS threads
  ~10 GB of RAM just for stacks          ~20 MB of RAM
  Most threads parked in kernel          Goroutines parked in user space
  Kernel scheduler thrashes              Runtime keeps cores busy

That's why "thread per request" is back in fashion via virtual threads (Java 21+). The model is the same; the cost is finally manageable.

Python is different

When to reach for what

CPU-bound work, Java/Go: pool of OS threads / goroutines sized to CPU count.
I/O-bound work, Java: virtual threads (or async for older Java).
I/O-bound work, Python: asyncio for high concurrency, threading for moderate.
CPU-bound work, Python: multiprocessing or rewrite hot path in C/Rust.
Hard isolation (crash containment, security boundary): separate processes.

Warning

Primitives by language

java.lang.Thread (platform thread, ~1 MB stack)
Thread.ofVirtual() (Java 21+, M:N scheduled)
Runnable / Callable
Executors.newVirtualThreadPerTaskExecutor()

Implementations

Platform thread vs Virtual thread

 1  // Platform thread, backed by an OS thread
 2  Thread platform = new Thread(() -> {
 3      System.out.println("Platform: " + Thread.currentThread());
 4  });
 5  platform.start();
 6  platform.join();
 7  
 8  // Virtual thread, JVM-managed (Java 21+)
 9  Thread virtual = Thread.ofVirtual().start(() -> {
10      System.out.println("Virtual: " + Thread.currentThread());
11  });
12  virtual.join();

One million virtual threads

newVirtualThreadPerTaskExecutor() makes the request-per-thread model viable again. Each task gets its own virtual thread; blocking calls park the virtual thread without blocking a carrier.

1  try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
2      IntStream.range(0, 1_000_000).forEach(i ->
3          executor.submit(() -> {
4              Thread.sleep(Duration.ofSeconds(1));
5              return i;
6          })
7      );
8  } // auto-closed: waits for all tasks to finish

Key points

•Process: isolated address space, expensive context switch (~tens of μs), IPC required to share data
•OS thread: ~1 MB stack, context switch ~1–10 μs, kernel-scheduled
•Goroutine: ~2 KB initial stack (grows), context switch ~hundreds of ns, runtime-scheduled
•Java virtual threads (21+): JVM-managed, M:N model, millions per JVM
•Python threads share the GIL, only one executes Python bytecode at a time
•Prefer processes for CPU-bound parallelism in Python; threads for I/O

Tradeoffs

Option	Pros	Cons	When to use
OS Thread (Java platform / Python threading.Thread)	Simple mental model Direct OS scheduling Good for CPU-bound work in Java	~1 MB stack each Slow context switches GIL caps parallelism in CPython	CPU-bound work in Java; I/O-bound work in Python
Virtual Thread (Java 21+)	Cheap (~few KB) Millions per JVM Block freely without cost	Pinning on synchronized blocks Newer, fewer tuned libraries	I/O-bound, high concurrency, request-per-thread servers
Goroutine	~2 KB initial stack Fast scheduling First-class with channels	Easy to leak No goroutine-local storage (use context)	Default unit of concurrency in Go
asyncio.Task / Process	asyncio: 100K+ concurrent I/O on one thread Process: bypasses GIL	asyncio: single-threaded, blocking calls poison the loop Process: heavy, IPC overhead	asyncio for I/O-heavy services; multiprocessing for CPU-bound Python

Follow-up questions

▸How many goroutines can a Go program run?

Practically millions on modern hardware. Each starts with ~2 KB stack that grows as needed. Limited by available memory, not OS thread count.

▸What's the GIL and why does it matter?

Global Interpreter Lock, CPython serializes execution of Python bytecode across threads. Threads help for I/O (GIL released around system calls) but not CPU. Use multiprocessing for CPU parallelism.

▸Difference between virtual threads and goroutines?

Both use M:N scheduling onto OS threads. Goroutines are runtime-native and integrate with channels. Virtual threads are JVM-managed and work with existing blocking JDK APIs without rewrites.

▸Why does Go limit OS threads but not goroutines?

GOMAXPROCS sets the number of OS threads (P's) executing Go code simultaneously. Goroutines are multiplexed onto these. Default = number of CPU cores.

▸Can a goroutine outlive the function that started it?

Yes, a goroutine runs until its function returns. If main() exits, all goroutines are killed. Otherwise it can run forever, which is how leaks happen.

Gotchas

!Goroutines started in a loop without context cancellation are the #1 source of leaks in production Go services
!Virtual threads pin to their carrier inside synchronized blocks, prefer ReentrantLock for I/O-heavy code
!Python threads do NOT give CPU parallelism, measure and switch to multiprocessing if compute-bound
!Mixing asyncio and threading is a footgun, use asyncio.to_thread() for blocking calls
!runtime.NumGoroutine() growing unbounded over time = leak; alert on it

Common pitfalls

Confusing concurrency (multiple tasks in progress) with parallelism (multiple tasks running simultaneously)
Assuming Python threads parallelize CPU work, they don't, the GIL prevents it
Using multiprocessing for I/O, process overhead dwarfs the I/O wait

APIs worth memorising

Java: Thread.ofVirtual(), Executors.newVirtualThreadPerTaskExecutor()
Python: threading.Thread, multiprocessing.Process, asyncio.create_task(), concurrent.futures
Go: go keyword, runtime.GOMAXPROCS, runtime.NumGoroutine, context.Context

Where this shows up

Process vs Thread vs Goroutine

Diagram

Three units, three cost profiles

Why it matters

The M:N trick

Python is different

When to reach for what

Primitives by language

Implementations

Key points

Tradeoffs

Follow-up questions

Gotchas

Common pitfalls

APIs worth memorising

Related reading

Process vs Thread vs Goroutine

Diagram

Three units, three cost profiles

Why it matters

The M:N trick

Python is different

When to reach for what

Primitives by language

Implementations

Key points

Tradeoffs

Follow-up questions

Gotchas

Common pitfalls

APIs worth memorising

Related reading