Bug Hunt: Why Does My RWMutex Deadlock When I Re-enter?
RWMutex is not reentrant. A goroutine holding the read lock that calls a method which also tries to acquire the read lock can deadlock if a writer is waiting in between. Java's ReentrantReadWriteLock is reentrant by design; sync.RWMutex is intentionally not. The fix is to restructure so locks aren't reentered.
The puzzle
A cache that supports get(key) and a getCount() method that totals all values. Both methods take the read lock. Looks fine. Tests pass. Under low load, production looks fine. Then a write operation happens during a getCount() and the whole service hangs.
What's special about the read-then-read-during-write interleaving?
The single-thread surprise
The deadlock isn't between two different threads competing for two locks. It's a single thread re-acquiring the same lock with a writer waiting in between. People don't expect a deadlock in single-threaded re-entry, but the writer-preference rule of RWMutex makes it happen.
What to look for in the broken code
Read the language tab. The suspicious pattern: a function takes the lock, then calls another method on the same struct that also takes the lock. Without writers, this is fine on Go's RWMutex (multiple readers allowed). With writers in the mix:
- Goroutine A calls
getCount()→ acquires RLock #1. - Goroutine B calls
Set()→ callsLock()→ blocks because RLock is held. - Goroutine A's loop iterates → calls
getOrLoad()→ tries RLock #2. - RLock #2 blocks because writer is queued (writer-pref).
- Both goroutines are blocked. Deadlock.
Why writer-pref exists If new readers could acquire while a writer is queued, writers could starve forever (steady stream of readers = writer never gets the lock). Writer-pref breaks the cycle by saying: once a writer waits, no new readers may enter. It also creates this re-entry deadlock as a side effect.
The fix patterns
| Approach | When |
|---|---|
| Use a reentrant lock | Java's ReentrantReadWriteLock, Python's RLock, easiest. Small overhead. |
| Internal "Locked" helpers | Method named getXLocked() that assumes the caller holds the lock. No re-entry. |
| Snapshot + unlock | Acquire lock, copy data, release, then process. Best when the inner work is heavy. |
Best practice
Prefer the internal helper pattern. It makes the lock contract explicit at every call site (xxxLocked means "the caller must hold the lock"). Reentrant locks paper over the design issue without fixing it.
How to spot this in code review
The smell When one method takes a lock and calls another method on the same object that also takes a lock, pause. Ask:
- Is the lock reentrant? (Go RWMutex: no. Python Lock: no. Java ReentrantLock: yes.)
- If non-reentrant, can a writer arrive between the two reads? (Almost always yes in production.)
- Can the code be refactored to avoid re-entry?
The deadlock is invisible until the right load arrives. Catch it in review, not at 3 a.m.
Implementations
ReentrantReadWriteLock IS reentrant by design (the name says so). But many Java engineers reach for plain ReentrantLock thinking "all Java locks are reentrant", wrong. Semaphore.acquire() is NOT reentrant. StampedLock is NOT reentrant. Same trap, different primitive.
1 class Cache {
2 private final StampedLock lock = new StampedLock(); // NOT reentrant
3 private final Map<String, Integer> data = new HashMap<>();
4
5 int getOrLoad(String key) {
6 long stamp = lock.readLock(); // ← second readLock
7 try {
8 return data.getOrDefault(key, 0);
9 } finally { lock.unlockRead(stamp); }
10 }
11
12 int getCount() {
13 long stamp = lock.readLock(); // ← first readLock
14 try {
15 int total = 0;
16 for (String k : data.keySet()) {
17 total += getOrLoad(k); // ← StampedLock NOT reentrant
18 } // deadlocks under writer
19 return total;
20 } finally { lock.unlockRead(stamp); }
21 }
22 }The fix: choose a reentrant lock if the pattern is needed, OR refactor to lock once. ReentrantReadWriteLock allows the same thread to re-acquire, at a small overhead cost. StampedLock is faster but doesn't allow reentry. Pick based on whether the code needs reentrant semantics.
1 // Fix #1, use ReentrantReadWriteLock (reentrant by design)
2 class Cache {
3 private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
4 private final Map<String, Integer> data = new HashMap<>();
5
6 int getOrLoad(String key) {
7 lock.readLock().lock();
8 try {
9 return data.getOrDefault(key, 0);
10 } finally { lock.readLock().unlock(); }
11 }
12
13 int getCount() {
14 lock.readLock().lock();
15 try {
16 int total = 0;
17 for (String k : data.keySet()) {
18 total += getOrLoad(k); // OK, reentrant
19 }
20 return total;
21 } finally { lock.readLock().unlock(); }
22 }
23 }
24
25 // Fix #2, refactor with internal "already-locked" helpers
26 class Cache {
27 private final StampedLock lock = new StampedLock();
28 private final Map<String, Integer> data = new HashMap<>();
29
30 private int getOrLoadLocked(String key) { // caller must hold readLock
31 return data.getOrDefault(key, 0);
32 }
33
34 int getCount() {
35 long stamp = lock.readLock();
36 try {
37 int total = 0;
38 for (String k : data.keySet()) total += getOrLoadLocked(k);
39 return total;
40 } finally { lock.unlockRead(stamp); }
41 }
42 }Key points
- •sync.RWMutex (Go): NOT reentrant, RLock-then-Lock or RLock-then-RLock can deadlock
- •ReentrantReadWriteLock (Java): IS reentrant, same thread can re-acquire its own lock
- •Common trap: reader calls a method that also reads; a writer arrives between them; deadlock
- •Fix: restructure to acquire the lock once at the outer scope, or pass an already-locked context
Follow-up questions
▸Why is sync.RWMutex (Go) intentionally non-reentrant?
▸When does the RWMutex deadlock actually fire?
▸Should ReentrantLock be the default?
▸How does this differ from a normal deadlock?
Gotchas
- !Go: sync.RWMutex is writer-preferring, once Lock() is called, new RLocks block
- !Java: StampedLock and Semaphore are NOT reentrant; ReentrantLock and ReentrantReadWriteLock are
- !Python: threading.Lock is NOT reentrant; threading.RLock IS
- !Calling external code (callbacks, listeners) while holding a lock can introduce reentrancy
- !ReentrantLock is fine for nested calls; using one to call into completely unknown code is still risky (might block on something else)
Cache implementations are the #1 victim. Anything where a 'getter' calls another 'getter' on the same struct, both protected by RW locks, eventually hits this. The Go stdlib explicitly recommends not designing this pattern.