Priority Inversion

The setup in plain English

Three threads, ranked by urgency:

High is critical. Should run as soon as possible.
Medium is regular work. Will preempt anything below it.
Low is background. Lowest urgency.

One shared lock. Low takes the lock to do something quick. Then High needs the same lock and has to wait. So far this is normal: High will get the lock as soon as Low finishes.

Then Medium becomes runnable. The OS sees Medium is more important than Low and switches CPU to Medium. Low is now paused, still holding the lock. Medium runs and runs (it doesn't need the lock at all). High is stuck behind Low. Low is stuck behind Medium.

End result: High, the most critical thread, is blocked by Medium, which it should outrank. The order has inverted.

A timeline

Three threads, one lock held by Low, no priority inheritance. Time flows top to bottom.

High wanted to outrank Medium. Medium wanted to outrank Low. But Low was inside the critical section, and Medium doesn't care about the lock. So Medium runs ahead of High. Inversion.

The famous example

Mars Pathfinder, July 1997. The rover landed and started rebooting on its own after a few days. The cause was exactly this pattern:

Low: a weather-data thread held a lock on the information bus.
High: a bus management thread waited for that lock.
Medium: a comms thread became runnable and preempted the weather thread.
The bus management thread missed its watchdog deadline. The watchdog rebooted the rover.

NASA fixed it by flipping a flag on the mutex (enable priority inheritance) and pushed the patch to Mars over a slow link.

The fix: priority inheritance

When a high-priority thread starts waiting on a lock held by a lower-priority thread, the OS temporarily promotes the holder to the waiter's priority. The holder finishes, releases the lock, and drops back to its original priority.

Without priority inheritance (the broken case again, summarised):

With priority inheritance (the fix):

POSIX exposes this as PTHREAD_PRIO_INHERIT on the mutex attribute. Linux's PI futex implements it. Real-time operating systems (VxWorks, FreeRTOS, RT Linux) support it natively. Java does not. Go does not (no thread priorities). Python does not.

When this matters in normal code

Rarely. The classic setup needs three things together: real thread priorities, an OS scheduler that respects them strictly, and locks shared across priority levels. Most server code has none of those, so the textbook bug doesn't appear.

Tip

The same shape without priorities Even without strict thread priorities, the same pattern appears: a fast operation stuck behind a slow one because they share a lock. A user request waiting on a logging path that hit a slow disk. A real-time audio thread waiting on the UI thread mid-GC. The fix generalises: don't share locks between things that need to scale independently. Critical work should have its own queue, its own pool, its own lock, or no lock at all.

The lesson worth keeping is more general than the bug. When workloads of different urgency share the same lock or queue, the slowest one sets the floor for everyone else.

Follow-up questions

▸What is priority inheritance?

A scheduling rule: while a thread holds a lock that a higher-priority thread is waiting for, the holder is temporarily promoted to that priority. The medium-priority thread can no longer preempt it. Used in real-time OS kernels (VxWorks, RT Linux, FreeRTOS).

▸Does Java do priority inheritance?

No. The JVM advisory priorities map onto OS scheduler priorities, which usually do not implement inheritance. The practical advice for Java code is to avoid relying on thread priority for correctness.

▸How was the Mars Pathfinder bug fixed?

The team enabled priority inheritance on the VxWorks mutex used by the bus management task. NASA pushed a configuration change to the rover remotely. After that, the high-priority task could no longer be indirectly blocked by medium-priority work.

▸When does this matter for normal application code?

Rarely. It comes up when thread priorities are set, the OS respects them strictly (real-time configurations), and locks are shared across priority levels. Most servers ignore priorities and the problem does not occur. The lesson generalises though: any time a fast operation can wait behind a slow one, the shape is inversion-like.

The setup in plain English

Three threads, ranked by urgency:

High is critical. Should run as soon as possible.
Medium is regular work. Will preempt anything below it.
Low is background. Lowest urgency.

One shared lock. Low takes the lock to do something quick. Then High needs the same lock and has to wait. So far this is normal: High will get the lock as soon as Low finishes.

End result: High, the most critical thread, is blocked by Medium, which it should outrank. The order has inverted.

A timeline

Three threads, one lock held by Low, no priority inheritance. Time flows top to bottom.

High wanted to outrank Medium. Medium wanted to outrank Low. But Low was inside the critical section, and Medium doesn't care about the lock. So Medium runs ahead of High. Inversion.

The famous example

Mars Pathfinder, July 1997. The rover landed and started rebooting on its own after a few days. The cause was exactly this pattern:

Low: a weather-data thread held a lock on the information bus.
High: a bus management thread waited for that lock.
Medium: a comms thread became runnable and preempted the weather thread.
The bus management thread missed its watchdog deadline. The watchdog rebooted the rover.

NASA fixed it by flipping a flag on the mutex (enable priority inheritance) and pushed the patch to Mars over a slow link.

The fix: priority inheritance

Without priority inheritance (the broken case again, summarised):

With priority inheritance (the fix):

When this matters in normal code

Tip

The lesson worth keeping is more general than the bug. When workloads of different urgency share the same lock or queue, the slowest one sets the floor for everyone else.

Follow-up questions

▸What is priority inheritance?

▸Does Java do priority inheritance?

▸How was the Mars Pathfinder bug fixed?

▸When does this matter for normal application code?

The setup in plain English

A timeline

The famous example

The fix: priority inheritance

When this matters in normal code

Implementations

Key points

Follow-up questions

Related reading

Priority Inversion

The setup in plain English

A timeline

The famous example

The fix: priority inheritance

When this matters in normal code

Implementations

Key points

Follow-up questions

Related reading