File Systems & I/OTopic 6 of 19

File Systems & I/OIntermediate

File Locking (Advisory & Mandatory)

NginxPostgreSQLGit

🧠

Mental Model

Three ways to put a "reserved" sign on a restaurant table. flock() puts the sign on the whole table, and it stays as long as the person who placed it keeps sitting (tied to the fd). POSIX fcntl() reserves individual chairs, but the sign is tied to the person, not the seat -- if that person sits at any other table in the restaurant and then leaves, ALL their signs vanish everywhere, including the original. OFD locks fix this: the sign stays on the specific seat no matter what else happens. All three are advisory. A rude guest who ignores signs can sit wherever they want.

💡

The Problem

Two instances of a batch job run at the same time, both writing to the same output file. Writes interleave. File corrupted. A developer adds POSIX fcntl() locks -- problem solved, until a library somewhere in the process opens the same file for a quick stat check and closes it. That close() silently releases every POSIX lock the process held on that file, even locks on a completely different fd. Corruption returns with no error, no warning, no hint that the lock evaporated.

Architecture

Two cron jobs fire at the same time. Both open the same report file. Both start writing. The result? The first half of line 47 is from job A, the second half is from job B, and the file is now corrupted beyond repair.

This is the problem file locking solves. But here's what makes it treacherous: Linux provides three different locking mechanisms, and they have radically different rules about what happens on fork(), on close() of an unrelated fd, and when code runs on NFS instead of a local disk.

Pick the wrong one and the locks silently evaporate. No error. No warning. Just data corruption that shows up at 3 AM.

What Actually Happens

flock(): whole-file, fd-owned. The lock is owned by the open file description (struct file). All fds created via dup() or inherited via fork() share the same struct file, so they share the same lock. Closing one fd doesn't release the lock — only when ALL fds to that struct file are closed (or the process exits) does the lock release. This makes flock() relatively predictable for whole-file mutual exclusion.

POSIX fcntl() locks: byte-range, pid-owned. Locks are owned by the (pid, inode) pair. This creates a devastating behavior: if a process opens the same file via any path, on any fd, and closes that fd, ALL POSIX locks held by that process on that inode are silently released. A library acquires a lock on fd 5. Some unrelated code in the same process opens the same file on fd 8 for a quick read and closes it. The lock on fd 5 is gone. No error. This has caused real data corruption in production systems.

OFD locks (F_OFD_SETLK, Linux 3.15+): byte-range, fd-owned. The best of both worlds. Byte-range support from POSIX locks, struct-file ownership from flock(). Closing an unrelated fd does NOT release OFD locks. Lock ownership passes across fork() and dup() exactly as expected. The only API difference: l_pid must be set to 0. OFD locks should be the default choice for any new code that needs record locking.

How conflict checking works. When a new lock is requested, the kernel walks the inode's lock list (inode->i_flock), checking for overlapping byte ranges with conflicting types. Shared locks (read) conflict with exclusive locks (write), but not with each other. If there's a conflict, F_SETLK returns EAGAIN immediately, while F_SETLKW blocks the process on a wait queue until the conflicting lock is released.

Under the Hood

Advisory vs. mandatory: only one matters. All three mechanisms are advisory. They only prevent conflicting lock acquisitions — not conflicting I/O. A process that doesn't bother checking for locks can read and write freely. Mandatory locking (where read/write syscalls are blocked by locks) existed via mount -o mand plus a magic setgid-without-group-execute bit on the file, but it was unreliable (didn't cover mmap), had performance issues, and was deprecated in Linux 4.5 then removed in 5.15. In practice, all file locking is cooperative. It only works if every participant plays by the rules.

Deadlock detection. The kernel performs automatic deadlock detection for POSIX locks: if process A holds range X and waits for range Y, while process B holds Y and waits for X, the kernel detects the cycle and returns EDEADLK to one of them. This detection does NOT exist for flock() or OFD locks. Deadlocks with those mechanisms hang indefinitely. Careful lock ordering is essential to prevent cycles.

NFS changes everything. flock() on NFS is emulated by the kernel using fcntl() byte-range locks over the NLM protocol (or NFSv4 built-in locking). This silently changes flock()'s ownership semantics from fd-owned to pid-owned. The behavior carefully tested on local ext4 won't match what happens on NFS. For reliable distributed locking, use fcntl() directly and test on NFS, or skip file locks entirely and use external coordination (etcd, ZooKeeper, Redis).

SQLite's workaround. SQLite is the poster child for dealing with POSIX lock insanity. It maintains an internal "unix file" structure that reference-counts all fds opened to the same inode (keyed by st_dev + st_ino). It never calls close() while any fd to the same inode has locks held. It implements a 5-state locking protocol (UNLOCKED, SHARED, RESERVED, PENDING, EXCLUSIVE) using specific byte offsets in a lock page. The amount of engineering needed to make POSIX locks safe says everything about their design.

Common Questions

Why are POSIX fcntl() locks considered broken?

Three reasons: (1) Closing ANY fd to the same file releases ALL locks, even if the closed fd never held a lock. (2) Locks aren't preserved across fork() — the child gets copies, not shared locks. (3) Threads in the same process conflict with each other because they share a pid. OFD locks (F_OFD_SETLK, Linux 3.15+) fix all three by owning locks at the struct file level instead of the pid level.

What's the difference between flock(LOCK_EX) and O_EXCL?

They're completely unrelated despite both having "exclusive" in the name. flock(LOCK_EX) acquires an advisory runtime lock on an already-open file — it blocks or fails if another process holds a conflicting lock. O_EXCL (with O_CREAT) makes file creation atomic: open() fails with EEXIST if the file already exists. O_EXCL is for creation-time exclusivity. flock is for runtime access control. Git's .lock file pattern uses O_EXCL, not flock.

How is a single-instance daemon implemented?

Open a PID file (/var/run/myapp.pid) with O_CREAT|O_WRONLY. Call flock(fd, LOCK_EX|LOCK_NB). If it returns EWOULDBLOCK, another instance is running — exit. If it succeeds, write the PID. Critically: do NOT close the fd. The lock lives as long as the fd is open, which means it dies automatically when the process exits (even on crash). Use flock(), not POSIX locks, because flock() won't be accidentally released by closing some other fd to the same file.

Can locks protect against mmap()?

No. Advisory locks only affect lock acquisition, not I/O operations. A process that mmaps a file and reads/writes through the mapping completely bypasses any lock checks. This was also the fatal flaw of mandatory locking — it checked read()/write() but not mmap(), leaving a gaping hole. Coordinating access to mmapped files requires userspace synchronization (mutexes in shared memory, etc.).

How Technologies Use This

Nginx

A deployment script accidentally starts two Nginx master processes on the same configuration. Both bind the same worker sockets and corrupt shared memory zones, producing intermittent 502 errors under load that are nearly impossible to diagnose. This affects roughly 5% of deployments where process managers accidentally double-start services.

Nginx prevents this by acquiring an exclusive flock() on /run/nginx.pid during startup. The second instance's flock() call returns EWOULDBLOCK, and Nginx exits with a clear error before touching any shared state. Because flock() is tied to the open file description, the lock auto-releases if the master crashes, allowing a restart within seconds.

The pattern costs zero ongoing CPU. Use flock() on a PID file for any service that must be single-instance. It fails fast on double-start and self-heals on crash, which is exactly the behavior a process manager needs.

PostgreSQL

Two postmaster processes start against the same data directory. Without protection, both write to the write-ahead log simultaneously, producing immediate WAL corruption and unrecoverable data loss. There is no graceful recovery from this scenario.

The postmaster prevents this by acquiring an exclusive fcntl() lock on postmaster.pid at startup. If the lock is already held, the second instance prints "Is another postmaster already running?" and exits before touching any data files. Beyond startup protection, PostgreSQL exposes locking to applications through pg_advisory_lock(), which handles over 50,000 lock acquisitions per second without touching disk.

Use fcntl() locks for single-instance protection of any process that manages exclusive state on disk. For application-level coordination, pg_advisory_lock() provides distributed leader election, job queue deduplication, and double-processing prevention without external dependencies.

Git

Two developers push to the same branch at the same instant. Without coordination, both would write conflicting SHAs into refs/heads/main, and the repository silently corrupts. One push's commit becomes unreachable with no error message.

Git sidesteps advisory locks entirely and uses the lockfile protocol. Before updating refs/heads/main, it creates refs/heads/main.lock using open() with O_CREAT|O_EXCL, which atomically fails with EEXIST if the file already exists. The winning push writes the new SHA to the .lock file, calls fsync(), then renames it over the original ref. The losing push gets EEXIST, retries briefly, and reports "failed to lock" if the conflict persists.

This pattern uses filesystem atomicity instead of advisory locks, avoiding the POSIX fcntl() footgun entirely. It works reliably across NFS, local ext4, and every other backend because O_EXCL is universally supported.

Same Concept Across Tech

Technology	How it uses file locking	Key detail
SQLite	Uses POSIX fcntl() byte-range locks. Multiple readers, single writer	WAL mode reduces lock contention significantly
PostgreSQL	Advisory locks via SQL (pg_advisory_lock), not file locks. PID file uses flock()	Postmaster PID file prevents double-start
Git	.git/index.lock uses O_EXCL create as a lock (not flock/fcntl)	Lock file existence = lock held
Docker	PID files and lock files for daemon single-instance	Uses flock() for simplicity
Nginx	PID file locking prevents running two masters on the same config	flock() on the PID file
systemd	Manages PID files for Type=forking services	Validates PID file on startup

Comparison of locking mechanisms:

Feature	flock()	POSIX fcntl()	OFD locks
Granularity	Whole file	Byte range	Byte range
Identity	Per fd	Per process	Per fd
close() any fd releases?	No	YES (dangerous!)	No
fork() behavior	Inherited (shared)	Inherited (shared)	Not inherited
Thread-safe?	Not useful (all threads share fds)	No (per-process)	Yes (per-fd)
NFS support	Unreliable	Yes (via NLM)	Yes (via NLM)

Design Rationale Mandatory locking would require checking lock state on every read() and write(), adding overhead to the hottest I/O path even when no locks exist -- so the POSIX committee chose advisory locking. Cooperative, yes, but the alternative was unreliable anyway because mandatory locks could not cover mmap() without unacceptable complexity. POSIX fcntl() locks used per-process ownership because the original use case was database record locking with one process managing many fds, but that design produced the devastating "close any fd releases all locks" behavior. OFD locks (Linux 3.15) anchored ownership to the open file description instead, finally fixing the problem. The broken POSIX semantics could not be changed without violating the standard, so a new API was the only way out.

If You See This, Think This

Symptom	Likely cause	First check
Lock silently released, no error	POSIX fcntl lock dropped by close() on unrelated fd to same file	Switch to OFD locks or flock()
Two processes both think they hold the lock	Advisory lock not checked by one process (advisory = not enforced)	Check if all processes use the same locking mechanism
Lock held by dead process (stale lock)	Process crashed without releasing. flock() auto-releases on fd close	flock() and OFD locks auto-release. POSIX locks too (per-process)
Lock file exists but no process holds it	O_EXCL-based lock file not cleaned up after crash	Check PID inside lock file, remove if process is dead
NFS file locking fails silently	NFS lock manager (NLM) not running or unreliable	Avoid relying on NFS locks for correctness
fork() child inherits unwanted locks	flock() and POSIX locks are inherited across fork()	OFD locks are NOT inherited, use those instead

When to Use / Avoid

Use flock() when:

Need simple whole-file locking (PID files, log rotation)
Lock must survive fork() (flock locks are inherited by child processes)

Use OFD locks (fcntl F_OFD_SETLK) when:

Need byte-range locking that survives close() of other fds to the same file
Need locks that work correctly in multi-threaded programs
Available on Linux 3.15+

Avoid POSIX fcntl() locks unless:

Need byte-range locking on older kernels without OFD support
Fully understand that ANY close() on the same file releases ALL POSIX locks

Never rely on advisory locks alone when:

Other programs may access the file without checking for locks (they can still write freely)

Try It Yourself

 1  # Display all active file locks: lock ID, type (POSIX/FLOCK), mode (READ/WRITE), PID, device:inode, byte range
 2  cat /proc/locks
 3  
 4  # User-friendly listing of all locks with process name, type, size, mode, and resolved file path
 5  lslocks
 6  
 7  # Shell-level flock: acquire exclusive lock on file, run command, release on exit. used in cron job mutual exclusion
 8  flock /tmp/mylock.lck -c 'echo locked; sleep 10'
 9  
10  # Show which process holds the dpkg lock file open
11  fuser -v /var/lib/dpkg/lock
12  
13  # Trace SQLite's locking protocol. observe the sequence of fcntl F_SETLK calls
14  strace -e flock,fcntl sqlite3 /tmp/test.db '.tables'
15  
16  # Bash pattern for single-instance scripts: open fd 9 on lock file, try non-blocking flock
17  exec 9>/tmp/pidfile.lock; flock -n 9 || { echo 'already running'; exit 1; }

Debug Checklist

1Check file locks: cat /proc/locks
2Check locks for a specific process: lslocks -p <pid>
3Check if a file is locked: flock -n /path/to/file -c 'echo unlocked' || echo 'locked'
4List all locks system-wide: lslocks
5Check lock type: cat /proc/locks (FLOCK = flock, POSIX = fcntl, OFDLCK = OFD)
6Strace lock operations: strace -e flock,fcntl -p <pid>

Key Takeaways

✓POSIX fcntl() locks have a devastating footgun: locks are owned by (pid, inode), not by fd. If you open the same file on a different fd and close it, ALL your locks on that inode vanish — silently. This has bitten every major database that uses them
✓flock() locks are tied to the struct file (open file description), not the pid. dup() and fork() share the lock, but independent open() calls get independent locks. This is usually the saner default for whole-file locking
✓OFD locks (F_OFD_SETLK, Linux 3.15+) are the modern fix — struct-file ownership like flock(), plus byte-range support like fcntl(). If you're writing new code that needs record locking, use OFD locks
✓Mandatory locking is dead. Deprecated in Linux 4.5, removed in 5.15. It never covered mmap(), had race conditions, and was never reliable. All locking in production is cooperative advisory locking
✓The kernel detects deadlocks for POSIX locks (returns EDEADLK) but NOT for flock() or OFD locks — those just hang forever if you create a cycle. Design your lock ordering carefully

Common Pitfalls

✗Using POSIX fcntl() locks in library code — any other code in the same process that opens and closes the same file silently releases your locks. Libraries can't control what the rest of the process does with fds
✗Assuming flock() works properly on NFS — Linux emulates flock() via fcntl() byte-range locks on NFS, which changes its ownership semantics from fd-owned to pid-owned. The behavior you tested locally won't match production
✗Spinning with F_SETLK in a loop instead of using F_SETLKW — this wastes CPU for no reason. F_SETLKW blocks in the kernel with proper waitqueue semantics and wakes you when the lock is available
✗Forgetting that ALL advisory locks are optional — they only work if every process accessing the file cooperates by checking locks. A rogue process that ignores locking can read and write freely

Reference

System Calls

flockfcntllockfopen

Tools

/proc/lockslslocksfuser

📌

In One Line

POSIX fcntl() locks vanish when any fd to the same file is closed -- use flock() for whole-file locking and OFD locks for byte-range, and treat POSIX locks as a legacy trap.

File Locking (Advisory & Mandatory)

NginxPostgreSQLGit

🧠

Mental Model

💡

The Problem

Architecture

Pick the wrong one and the locks silently evaporate. No error. No warning. Just data corruption that shows up at 3 AM.

What Actually Happens

Under the Hood

Common Questions

Why are POSIX fcntl() locks considered broken?

What's the difference between flock(LOCK_EX) and O_EXCL?

How is a single-instance daemon implemented?

Can locks protect against mmap()?

How Technologies Use This

Nginx

PostgreSQL

Git

Same Concept Across Tech

Technology	How it uses file locking	Key detail
SQLite	Uses POSIX fcntl() byte-range locks. Multiple readers, single writer	WAL mode reduces lock contention significantly
PostgreSQL	Advisory locks via SQL (pg_advisory_lock), not file locks. PID file uses flock()	Postmaster PID file prevents double-start
Git	.git/index.lock uses O_EXCL create as a lock (not flock/fcntl)	Lock file existence = lock held
Docker	PID files and lock files for daemon single-instance	Uses flock() for simplicity
Nginx	PID file locking prevents running two masters on the same config	flock() on the PID file
systemd	Manages PID files for Type=forking services	Validates PID file on startup

Comparison of locking mechanisms:

Feature	flock()	POSIX fcntl()	OFD locks
Granularity	Whole file	Byte range	Byte range
Identity	Per fd	Per process	Per fd
close() any fd releases?	No	YES (dangerous!)	No
fork() behavior	Inherited (shared)	Inherited (shared)	Not inherited
Thread-safe?	Not useful (all threads share fds)	No (per-process)	Yes (per-fd)
NFS support	Unreliable	Yes (via NLM)	Yes (via NLM)

If You See This, Think This

Symptom	Likely cause	First check
Lock silently released, no error	POSIX fcntl lock dropped by close() on unrelated fd to same file	Switch to OFD locks or flock()
Two processes both think they hold the lock	Advisory lock not checked by one process (advisory = not enforced)	Check if all processes use the same locking mechanism
Lock held by dead process (stale lock)	Process crashed without releasing. flock() auto-releases on fd close	flock() and OFD locks auto-release. POSIX locks too (per-process)
Lock file exists but no process holds it	O_EXCL-based lock file not cleaned up after crash	Check PID inside lock file, remove if process is dead
NFS file locking fails silently	NFS lock manager (NLM) not running or unreliable	Avoid relying on NFS locks for correctness
fork() child inherits unwanted locks	flock() and POSIX locks are inherited across fork()	OFD locks are NOT inherited, use those instead

When to Use / Avoid

Use flock() when:

Need simple whole-file locking (PID files, log rotation)
Lock must survive fork() (flock locks are inherited by child processes)

Use OFD locks (fcntl F_OFD_SETLK) when:

Need byte-range locking that survives close() of other fds to the same file
Need locks that work correctly in multi-threaded programs
Available on Linux 3.15+

Avoid POSIX fcntl() locks unless:

Need byte-range locking on older kernels without OFD support
Fully understand that ANY close() on the same file releases ALL POSIX locks

Never rely on advisory locks alone when:

Other programs may access the file without checking for locks (they can still write freely)

Try It Yourself

 1  # Display all active file locks: lock ID, type (POSIX/FLOCK), mode (READ/WRITE), PID, device:inode, byte range
 2  cat /proc/locks
 3  
 4  # User-friendly listing of all locks with process name, type, size, mode, and resolved file path
 5  lslocks
 6  
 7  # Shell-level flock: acquire exclusive lock on file, run command, release on exit. used in cron job mutual exclusion
 8  flock /tmp/mylock.lck -c 'echo locked; sleep 10'
 9  
10  # Show which process holds the dpkg lock file open
11  fuser -v /var/lib/dpkg/lock
12  
13  # Trace SQLite's locking protocol. observe the sequence of fcntl F_SETLK calls
14  strace -e flock,fcntl sqlite3 /tmp/test.db '.tables'
15  
16  # Bash pattern for single-instance scripts: open fd 9 on lock file, try non-blocking flock
17  exec 9>/tmp/pidfile.lock; flock -n 9 || { echo 'already running'; exit 1; }

Debug Checklist

1Check file locks: cat /proc/locks
2Check locks for a specific process: lslocks -p <pid>
3Check if a file is locked: flock -n /path/to/file -c 'echo unlocked' || echo 'locked'
4List all locks system-wide: lslocks
5Check lock type: cat /proc/locks (FLOCK = flock, POSIX = fcntl, OFDLCK = OFD)
6Strace lock operations: strace -e flock,fcntl -p <pid>

Key Takeaways

✓POSIX fcntl() locks have a devastating footgun: locks are owned by (pid, inode), not by fd. If you open the same file on a different fd and close it, ALL your locks on that inode vanish — silently. This has bitten every major database that uses them
✓flock() locks are tied to the struct file (open file description), not the pid. dup() and fork() share the lock, but independent open() calls get independent locks. This is usually the saner default for whole-file locking
✓OFD locks (F_OFD_SETLK, Linux 3.15+) are the modern fix — struct-file ownership like flock(), plus byte-range support like fcntl(). If you're writing new code that needs record locking, use OFD locks
✓Mandatory locking is dead. Deprecated in Linux 4.5, removed in 5.15. It never covered mmap(), had race conditions, and was never reliable. All locking in production is cooperative advisory locking
✓The kernel detects deadlocks for POSIX locks (returns EDEADLK) but NOT for flock() or OFD locks — those just hang forever if you create a cycle. Design your lock ordering carefully

Common Pitfalls

✗Using POSIX fcntl() locks in library code — any other code in the same process that opens and closes the same file silently releases your locks. Libraries can't control what the rest of the process does with fds
✗Assuming flock() works properly on NFS — Linux emulates flock() via fcntl() byte-range locks on NFS, which changes its ownership semantics from fd-owned to pid-owned. The behavior you tested locally won't match production
✗Spinning with F_SETLK in a loop instead of using F_SETLKW — this wastes CPU for no reason. F_SETLKW blocks in the kernel with proper waitqueue semantics and wakes you when the lock is available
✗Forgetting that ALL advisory locks are optional — they only work if every process accessing the file cooperates by checking locks. A rogue process that ignores locking can read and write freely

Reference

System Calls

flockfcntllockfopen

Tools

/proc/lockslslocksfuser

📌

In One Line

POSIX fcntl() locks vanish when any fd to the same file is closed -- use flock() for whole-file locking and OFD locks for byte-range, and treat POSIX locks as a legacy trap.

File Locking (Advisory & Mandatory)

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

File Locking (Advisory & Mandatory)

Mental Model

The Problem

Architecture

What Actually Happens

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics