Core Mental ModelsTopic 10 of 19

ConceptIntermediateAsked Often

Deadlock, Livelock, Starvation

In one line

Deadlock, threads block each other forever. Livelock, threads are running but making no progress. Starvation, one thread never gets the resource it needs because others keep grabbing it first. All three look similar from the outside; the fix is different for each.

Three failures that look alike from outside

All three make a service hang or grind to a halt. The causes and fixes are completely different. Mixing them up costs days of debugging.

                  CPU usage    Are threads doing  Who's stuck?
                  while stuck  anything?
                  -----------  ----------------   ------------
DEADLOCK          ~0%          No, all parked     Everyone in the cycle
LIVELOCK          ~100%        Yes, but cancels   Everyone in the dance
STARVATION        normal       Yes, others fine   One specific thread

Symptom-to-cause map: 100% CPU + zero throughput = livelock. 0% CPU + nothing moves = deadlock. Throughput looks normal but one user/job never gets served = starvation.

Deadlock: the circular wait

The cycle below shows the classic two-thread deadlock. Each thread holds one lock and waits for the other.

Both threads wait. Neither can release. Forever.

Coffman's 1971 analysis: a deadlock requires four conditions, all at the same time.

1. Mutual exclusion       at least one resource only one thread at a time
2. Hold-and-wait          a thread holds one resource while waiting for another
3. No preemption          resources only released voluntarily
4. Circular wait          there's a cycle in the wait-for graph

Break any single one and deadlock becomes impossible. The cheapest break in production code is #4: enforce a global lock acquisition order. If every thread always acquires Lock 1 before Lock 2, no cycle can form.

Livelock: the hallway dance

Two threads detect contention and both back off "to be nice." Their backoffs happen at the same time, so when they retry they collide again. They keep "yielding" to each other forever.

The hallway analogy: two people meet in a corridor. Both step left to let the other pass. They bump. Both step right. They bump. Repeat forever, neither moving.

TIME >

  Thread A:   "you go first"   sleep 10ms      retry      "you go first"   sleep 10ms   ...
  Thread B:   "no, after you"  sleep 10ms      retry      "no, after you"  sleep 10ms   ...

              ^                                ^
              both yield                       both yield again
              at the same time                 at the same time

  Result:  no thread blocks (CPU stays at 100%), but no thread makes progress either.

The fix is random jitter on the retry. Each thread sleeps for base + random(0, base) instead of a fixed base. The randomness breaks the symmetry, so on the next collision one thread retries before the other and wins the lock. Exponential backoff with jitter is the canonical pattern and the only one that reliably works.

Starvation: the never-picked thread

A thread is ready to run but the scheduler or lock keeps picking other threads. Common causes:

Priority inversion: a high-priority thread waits on a lock held by a low-priority thread, which gets preempted by medium-priority threads. (See the priority-inversion lesson and the Mars Pathfinder story.)
Unfair locks: a newcomer barges in front of queued waiters.
Reader-preference RW locks: writers starve while readers keep arriving.

Warning

The Mars Pathfinder bug (1997) The Pathfinder rover hit a priority inversion bug, a low-priority weather-data thread held a lock that a high-priority bus-management thread needed. A medium-priority comms thread kept preempting the low-priority one. The bus thread missed deadlines, the watchdog rebooted the rover. NASA debugged it remotely from Earth.

How to detect them in production

Failure	Detection
Deadlock (Java)	`jstack <pid>` shows "Found X deadlocked threads"
Deadlock (Go)	All-asleep panic, `runtime/pprof goroutine` profile
Deadlock (Python)	`faulthandler.dump_traceback_later`, `py-spy dump`
Livelock	High CPU, low/no throughput on metrics, `pprof` shows hot retry loops
Starvation	Tail latency spikes for specific request types; thread/queue length metrics

Tip

The interview rule of thumb If asked "how should a hang be debugged?", the order is: check thread/goroutine dumps for blocked threads (deadlock), check CPU and retry counts (livelock), check tail latency by request type (starvation). The dump reveals which one.

Implementations

Classic deadlock, two locks, two threads

Thread A locks accountFrom then tries to lock accountTo. Thread B simultaneously locks accountTo then tries to lock accountFrom. Each holds what the other needs. Both block forever.

 1  class Account {
 2      private final ReentrantLock lock = new ReentrantLock();
 3      int balance;
 4  
 5      static void transfer(Account from, Account to, int amount) {
 6          from.lock.lock();
 7          try {
 8              to.lock.lock();   // ← if another thread holds 'to' and waits for 'from', deadlock
 9              try {
10                  from.balance -= amount;
11                  to.balance += amount;
12              } finally { to.lock.unlock(); }
13          } finally { from.lock.unlock(); }
14      }
15  }

Fix, global lock ordering

Always acquire locks in the same order across the program. Use any total ordering, here, the account's identity hash. Now both threads acquire lower first, so circular wait is impossible.

 1  static void transfer(Account from, Account to, int amount) {
 2      Account first  = System.identityHashCode(from) < System.identityHashCode(to) ? from : to;
 3      Account second = first == from ? to : from;
 4      first.lock.lock();
 5      try {
 6          second.lock.lock();
 7          try {
 8              from.balance -= amount;
 9              to.balance += amount;
10          } finally { second.lock.unlock(); }
11      } finally { first.lock.unlock(); }
12  }

Alternate fix, tryLock with timeout

Instead of preventing the cycle, detect it: if the second lock isn't available within a timeout, back off and retry. Breaks the "no preemption" Coffman condition.

 1  static boolean transfer(Account from, Account to, int amount) throws InterruptedException {
 2      long deadline = System.nanoTime() + TimeUnit.MILLISECONDS.toNanos(100);
 3      while (System.nanoTime() < deadline) {
 4          if (from.lock.tryLock(10, TimeUnit.MILLISECONDS)) {
 5              try {
 6                  if (to.lock.tryLock(10, TimeUnit.MILLISECONDS)) {
 7                      try {
 8                          from.balance -= amount;
 9                          to.balance += amount;
10                          return true;
11                      } finally { to.lock.unlock(); }
12                  }
13              } finally { from.lock.unlock(); }
14          }
15          Thread.sleep((long) (Math.random() * 5));  // jitter
16      }
17      return false;
18  }

Key points

•Deadlock = circular wait, Thread A holds X waits for Y, Thread B holds Y waits for X
•Coffman conditions: mutual exclusion + hold-and-wait + no preemption + circular wait
•Break ANY ONE of the four → no deadlock
•Lock ordering (always acquire locks in a fixed global order) is the most common deadlock prevention
•Livelock = threads keep retrying and stepping on each other, no blocking, no progress
•Starvation = unfair scheduling, a thread is permanently outrun by others

Follow-up questions

▸What are the four Coffman conditions for deadlock?

1) Mutual exclusion (resource can't be shared); 2) Hold-and-wait (thread holds one resource and waits for another); 3) No preemption (resources can't be forcibly taken); 4) Circular wait (a cycle in the wait-for graph). All four must hold simultaneously, break ANY one and deadlock is impossible.

▸What's the difference between deadlock and livelock?

In deadlock, threads are blocked, doing nothing, waiting forever. In livelock, threads are running and consuming CPU, but their actions cancel out and no progress is made. Livelock is harder to detect because the program looks 'busy.'

▸How is deadlock prevented in practice?

Most common: enforce a global lock acquisition order. Other tactics: tryLock with timeout (break no-preemption), open call (release the lock before calling out), structured concurrency (locks scoped to a function). The cheapest fix is usually 'don't hold two locks at once.'

▸Why does Java's ReentrantLock have tryLock and Go's sync.Mutex doesn't (until recently)?

Go's design philosophy is that lock contention should be rare; reaching for tryLock often signals a design problem. Go 1.18 added TryLock anyway for cases that need a fallback code path. Use it sparingly, most uses indicate the lock is in the wrong place.

▸What's starvation?

When a thread perpetually fails to acquire the resource it needs because other threads keep getting there first. Common with priority schedulers (high-priority threads keep preempting low-priority ones) or unfair locks (newcomers grab the lock before queued waiters).

Gotchas

!Reentrant locks (Java ReentrantLock, Python RLock) prevent self-deadlock but NOT mutual deadlock between threads
!Holding a lock while calling user-supplied code (a callback, a remote service) is a deadlock waiting to happen, that code might try to take a held lock
!Go's runtime detects only 'all goroutines asleep' deadlocks, partial deadlocks (some goroutines stuck, others running) are silent
!Java's jstack and -XX:+PrintConcurrentLocks show deadlocked threads, use them in production hangs
!Database transactions can deadlock too, ORDER BY-then-FOR UPDATE on different tables is the SQL version

Common pitfalls

Adding more locks to 'fix' a race, often introduces deadlock instead
Acquiring locks in different order in different code paths, the most common deadlock cause
Using sleep-based backoff without jitter → livelock under contention
Spinning on a try-lock without backoff → CPU burn livelock

Practice problems

Detect deadlock in a wait-for graph

Cycle detection in a directed graph (DFS or topological sort)

Dining Philosophers

5 philosophers, 5 forks, classic deadlock setup. Solutions: enforce odd philosophers grab left first, even grab right first; or limit to 4 philosophers eating at once.

APIs worth memorising

Java: ReentrantLock.tryLock(timeout), jstack, ThreadMXBean.findDeadlockedThreads()
Python: threading.Lock.acquire(timeout), threading.Lock.acquire(blocking=False)
Go: select with timeout case, sync.Mutex.TryLock (1.18+)

Where this shows up

Database systems use deadlock detectors (cycle detection in the wait-for graph) and abort one of the transactions. JVM monitors detect deadlocks and dump them on demand. Production hangs are almost always one of: deadlock, network timeout, or infinite loop.

Deadlock, Livelock, Starvation

In one line

Three failures that look alike from outside

All three make a service hang or grind to a halt. The causes and fixes are completely different. Mixing them up costs days of debugging.

                  CPU usage    Are threads doing  Who's stuck?
                  while stuck  anything?
                  -----------  ----------------   ------------
DEADLOCK          ~0%          No, all parked     Everyone in the cycle
LIVELOCK          ~100%        Yes, but cancels   Everyone in the dance
STARVATION        normal       Yes, others fine   One specific thread

Symptom-to-cause map: 100% CPU + zero throughput = livelock. 0% CPU + nothing moves = deadlock. Throughput looks normal but one user/job never gets served = starvation.

Deadlock: the circular wait

The cycle below shows the classic two-thread deadlock. Each thread holds one lock and waits for the other.

Both threads wait. Neither can release. Forever.

Coffman's 1971 analysis: a deadlock requires four conditions, all at the same time.

1. Mutual exclusion       at least one resource only one thread at a time
2. Hold-and-wait          a thread holds one resource while waiting for another
3. No preemption          resources only released voluntarily
4. Circular wait          there's a cycle in the wait-for graph

Livelock: the hallway dance

Two threads detect contention and both back off "to be nice." Their backoffs happen at the same time, so when they retry they collide again. They keep "yielding" to each other forever.

The hallway analogy: two people meet in a corridor. Both step left to let the other pass. They bump. Both step right. They bump. Repeat forever, neither moving.

TIME >

  Thread A:   "you go first"   sleep 10ms      retry      "you go first"   sleep 10ms   ...
  Thread B:   "no, after you"  sleep 10ms      retry      "no, after you"  sleep 10ms   ...

              ^                                ^
              both yield                       both yield again
              at the same time                 at the same time

  Result:  no thread blocks (CPU stays at 100%), but no thread makes progress either.

Starvation: the never-picked thread

A thread is ready to run but the scheduler or lock keeps picking other threads. Common causes:

Priority inversion: a high-priority thread waits on a lock held by a low-priority thread, which gets preempted by medium-priority threads. (See the priority-inversion lesson and the Mars Pathfinder story.)
Unfair locks: a newcomer barges in front of queued waiters.
Reader-preference RW locks: writers starve while readers keep arriving.

Warning

How to detect them in production

Failure	Detection
Deadlock (Java)	`jstack <pid>` shows "Found X deadlocked threads"
Deadlock (Go)	All-asleep panic, `runtime/pprof goroutine` profile
Deadlock (Python)	`faulthandler.dump_traceback_later`, `py-spy dump`
Livelock	High CPU, low/no throughput on metrics, `pprof` shows hot retry loops
Starvation	Tail latency spikes for specific request types; thread/queue length metrics

Tip

Implementations

Classic deadlock, two locks, two threads

Thread A locks accountFrom then tries to lock accountTo. Thread B simultaneously locks accountTo then tries to lock accountFrom. Each holds what the other needs. Both block forever.

 1  class Account {
 2      private final ReentrantLock lock = new ReentrantLock();
 3      int balance;
 4  
 5      static void transfer(Account from, Account to, int amount) {
 6          from.lock.lock();
 7          try {
 8              to.lock.lock();   // ← if another thread holds 'to' and waits for 'from', deadlock
 9              try {
10                  from.balance -= amount;
11                  to.balance += amount;
12              } finally { to.lock.unlock(); }
13          } finally { from.lock.unlock(); }
14      }
15  }

Fix, global lock ordering

Always acquire locks in the same order across the program. Use any total ordering, here, the account's identity hash. Now both threads acquire lower first, so circular wait is impossible.

 1  static void transfer(Account from, Account to, int amount) {
 2      Account first  = System.identityHashCode(from) < System.identityHashCode(to) ? from : to;
 3      Account second = first == from ? to : from;
 4      first.lock.lock();
 5      try {
 6          second.lock.lock();
 7          try {
 8              from.balance -= amount;
 9              to.balance += amount;
10          } finally { second.lock.unlock(); }
11      } finally { first.lock.unlock(); }
12  }

Alternate fix, tryLock with timeout

Instead of preventing the cycle, detect it: if the second lock isn't available within a timeout, back off and retry. Breaks the "no preemption" Coffman condition.

 1  static boolean transfer(Account from, Account to, int amount) throws InterruptedException {
 2      long deadline = System.nanoTime() + TimeUnit.MILLISECONDS.toNanos(100);
 3      while (System.nanoTime() < deadline) {
 4          if (from.lock.tryLock(10, TimeUnit.MILLISECONDS)) {
 5              try {
 6                  if (to.lock.tryLock(10, TimeUnit.MILLISECONDS)) {
 7                      try {
 8                          from.balance -= amount;
 9                          to.balance += amount;
10                          return true;
11                      } finally { to.lock.unlock(); }
12                  }
13              } finally { from.lock.unlock(); }
14          }
15          Thread.sleep((long) (Math.random() * 5));  // jitter
16      }
17      return false;
18  }

Key points

•Deadlock = circular wait, Thread A holds X waits for Y, Thread B holds Y waits for X
•Coffman conditions: mutual exclusion + hold-and-wait + no preemption + circular wait
•Break ANY ONE of the four → no deadlock
•Lock ordering (always acquire locks in a fixed global order) is the most common deadlock prevention
•Livelock = threads keep retrying and stepping on each other, no blocking, no progress
•Starvation = unfair scheduling, a thread is permanently outrun by others

Follow-up questions

▸What are the four Coffman conditions for deadlock?

▸What's the difference between deadlock and livelock?

▸How is deadlock prevented in practice?

▸Why does Java's ReentrantLock have tryLock and Go's sync.Mutex doesn't (until recently)?

▸What's starvation?

Gotchas

!Reentrant locks (Java ReentrantLock, Python RLock) prevent self-deadlock but NOT mutual deadlock between threads
!Holding a lock while calling user-supplied code (a callback, a remote service) is a deadlock waiting to happen, that code might try to take a held lock
!Go's runtime detects only 'all goroutines asleep' deadlocks, partial deadlocks (some goroutines stuck, others running) are silent
!Java's jstack and -XX:+PrintConcurrentLocks show deadlocked threads, use them in production hangs
!Database transactions can deadlock too, ORDER BY-then-FOR UPDATE on different tables is the SQL version

Common pitfalls

Adding more locks to 'fix' a race, often introduces deadlock instead
Acquiring locks in different order in different code paths, the most common deadlock cause
Using sleep-based backoff without jitter → livelock under contention
Spinning on a try-lock without backoff → CPU burn livelock

Practice problems

Detect deadlock in a wait-for graph

Cycle detection in a directed graph (DFS or topological sort)

Dining Philosophers

5 philosophers, 5 forks, classic deadlock setup. Solutions: enforce odd philosophers grab left first, even grab right first; or limit to 4 philosophers eating at once.

APIs worth memorising

Java: ReentrantLock.tryLock(timeout), jstack, ThreadMXBean.findDeadlockedThreads()
Python: threading.Lock.acquire(timeout), threading.Lock.acquire(blocking=False)
Go: select with timeout case, sync.Mutex.TryLock (1.18+)

Where this shows up

Deadlock, Livelock, Starvation

Three failures that look alike from outside

Deadlock: the circular wait

Livelock: the hallway dance

Starvation: the never-picked thread

How to detect them in production

Implementations

Key points

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading

Deadlock, Livelock, Starvation

Three failures that look alike from outside

Deadlock: the circular wait

Livelock: the hallway dance

Starvation: the never-picked thread

How to detect them in production

Implementations

Key points

Follow-up questions

Gotchas

Common pitfalls

Practice problems

APIs worth memorising

Related reading