Directory Entries & Path Resolution
Mental Model
Finding Room 3-B-7 in a building. Walk to Floor 3 (badge check), find Hall B (door unlocked?), then Room 7. Each step is a separate lookup. A receptionist near the entrance memorizes recently visited rooms -- frequent visitors skip the walk entirely and get directions in seconds. That receptionist is the dentry cache. The floor/hall/room numbering system is path resolution.
The Problem
A web server opens /var/www/app/static/css/main.css on every request -- six path components, six directory lookups. At 50,000 requests per second that adds up to 300,000 lookups. Without the dentry cache, each one hits disk. Right after a reboot or a bout of memory pressure, the first wave of requests runs 100x slower because every component has to be loaded fresh from the filesystem.
Architecture
Every time a file is opened, the kernel doesn't just jump straight to the data. It walks.
/home/user/docs/report.txt isn't a single lookup. It's four separate lookups: "home" inside /, then "user" inside home, then "docs" inside user, then "report.txt" inside docs. Each step checks permissions. Each step might cross a mount point into a different filesystem. Each step might follow a symlink into an entirely different part of the tree.
And this walk happens on every open(), stat(), and exec() call. On a busy server, that's millions of walks per second. Without caching, the system would spend all its time reading directories from disk just to find files it already knew about.
What Actually Happens
When a process calls open("/home/user/docs/report.txt", ...), the kernel's namei subsystem takes over:
- Start at the root dentry (
/) - Hash ("home", parent=root dentry) and look it up in the dcache hash table
- Cache hit? Great — grab the inode, no disk I/O
- Cache miss? Fall back to reading the directory from disk, create a dentry, cache it
- Check execute permission on the directory (the process needs
xto traverse it) - Check if this dentry is a mount point — if so, cross into the mounted filesystem
- Check if this is a symlink — if so, read the target and restart resolution (up to 40 deep)
- Repeat for "user", "docs", "report.txt"
The kernel carries a struct nameidata through this walk — it tracks the current position (dentry + mount), the remaining path, resolution flags, and a symlink depth counter to catch loops.
Two walk modes. The kernel first tries RCU-walk: lockless path resolution using RCU read-side protection. No atomic operations on dentry reference counts. No locks. This makes hot-path lookups nearly zero-cost. If RCU-walk hits a snag — cache miss, need to sleep for a permission check, concurrent rename happening — it gracefully falls back to ref-walk, which takes proper locks and references. The fast path stays fast; the slow path stays correct.
The openat() family. Traditional path resolution starts from the process's current working directory or from root. But openat(dirfd, "relative/path", ...) resolves from a specific directory fd. This matters for security: once a directory has been opened and validated, openat() guarantees resolution stays within that directory, even if someone renames or replaces the path between the check and the use. This eliminates TOCTOU (time-of-check-to-time-of-use) vulnerabilities. The special value AT_FDCWD means "use the current working directory," making openat(AT_FDCWD, path, flags) equivalent to open(path, flags).
Under the Hood
Directory on-disk formats. Directories are just files whose data blocks contain directory entry records. But the on-disk format matters enormously for lookup speed. ext4 uses the HTree — a hashed B-tree where filenames are hashed (half-MD4) and organized for O(1) amortized lookup. Without it (on ext2, or small ext4 directories), lookups are linear scans: O(n) per lookup. In a Maildir with a million messages, that's the difference between instant and unbearable.
Negative dentries in detail. When a lookup fails (file doesn't exist), the kernel creates a dentry with a NULL inode pointer. This negative dentry sits in the cache and short-circuits future lookups for the same non-existent name — returning ENOENT immediately without touching disk. Negative dentries are evicted under memory pressure via the dcache shrinker. They're especially critical during $PATH search (where the shell tries every directory for a command) and header file resolution in compilers.
The '.' and '..' entries. Every directory contains two special entries: . (self) and .. (parent). These are real on-disk directory entries, not illusions. The . entry is a hard link to the directory's own inode — which is why st_nlink for an empty directory is 2 (the parent's entry for this directory, plus this directory's own .). Each subdirectory adds 1 to the parent's link count via its .. entry. That quirk is why find -type d can optimize tree traversal by counting st_nlink — if a directory has link count 2, it has no subdirectories.
Memory impact. Each dentry is approximately 192 bytes. Walk a million-file directory tree and a million dentries get created — about 192 MB of kernel memory. The dcache grows unboundedly, limited only by available memory. Under pressure, the kernel's dcache shrinker evicts LRU dentries. Forcing eviction with echo 2 > /proc/sys/vm/drop_caches is a sledgehammer that kills performance across the board. Better to monitor /proc/sys/fs/dentry-state and slabtop to understand cache pressure.
Common Questions
What is a TOCTOU vulnerability and how does openat() fix it?
TOCTOU (time-of-check-to-time-of-use) happens when a condition is checked and acted on in separate syscalls. Between access("/safe/dir/file", R_OK) and open("/safe/dir/file", ...), an attacker can replace /safe/dir with a symlink to /etc/shadow. The two calls resolve the same path string to different files. openat() fixes this: once /safe/dir is opened as fd 5, openat(5, "file", ...) resolves within that specific directory instance, regardless of what happens to the path name later. The directory is pinned by the fd — it can be renamed or unlinked, but its contents don't change.
Why does readdir() return files in seemingly random order?
Because directories aren't sorted. ext4 HTree organizes entries by filename hash for fast lookup, so readdir() returns them in hash order — which looks random. Small directories without HTree return entries in creation order. XFS uses B+ trees, returning entries in yet another hash order. If alphabetical output is needed, sort it after reading. ls does exactly this.
What's the deal with the execute bit on directories?
The execute (search) bit on a directory controls whether a process can traverse through it. The x permission is required to reach anything inside. The read bit (r) controls listing — whether the names of entries are visible. A directory can have x without r (traverse but can't list) or r without x (see names but can't access the actual files). This distinction is how directories can require users to know the exact filename to access it — a poor man's access control that's used more often than expected.
How can the dcache consume all system memory?
Every new path resolution that misses the cache creates a new dentry that stays in memory. A process that walks a million-file tree creates a million dentries — 192 MB. Multiply by many processes and many trees, and the dcache can dominate kernel memory usage. The kernel's dcache shrinker, triggered by the slab allocator's reclaim path, evicts LRU dentries when memory gets tight. Writing to /proc/sys/vm/drop_caches forces eviction, but kills performance. The real solution is monitoring: watch /proc/sys/fs/dentry-state and slabtop for creeping growth.
How Technologies Use This
A build server reboots, git status runs on a 100,000-file monorepo, and it takes 8 seconds. Five minutes later the same command finishes in 200 milliseconds. Nothing changed in the repo. The difference is entirely in the dentry cache.
Each git status triggers 100,000 lstat() calls, and each lstat() walks the path component by component through the dentry cache. On a cold cache after reboot, every lookup hits disk to read directory blocks. On a warm cache, the same walks resolve in nanoseconds via RCU-walk with zero locks. The cache is the entire performance story.
Git introduced core.fsmonitor to escape this bottleneck entirely. It uses inotify to track changed paths between invocations, so git status skips unchanged directories and reduces lstat() calls from 100K to just the files that actually changed.
A container host running 300 microservices shows 3GB of kernel slab memory consumed, yet no single process looks responsible. Applications are not leaking memory. The OOM killer is not firing. The memory is simply gone.
The culprit is dentries. Every file open inside a container triggers a path walk through overlay2, which checks the upper layer, then each lower layer, generating a separate dentry for each lookup attempt. At roughly 192 bytes per dentry, millions of cached path entries from hundreds of containers consume gigabytes of slab memory. The kernel dcache shrinker reclaims entries under memory pressure, but aggressive reclaim causes cache thrashing where the same paths get evicted and re-read from disk repeatedly.
Monitor with slabtop and watch the dentry_cache slab. Tuning vm.vfs_cache_pressure above 100 makes the shrinker more aggressive if memory is tight. Understanding dentry overhead is essential for sizing container host memory.
Same Concept Across Tech
| Technology | How path resolution affects it | Key detail |
|---|---|---|
| Docker | OverlayFS merges upper+lower directory entries. Path resolution checks both layers | More layers = more lookups per path component |
| Kubernetes | ConfigMap/Secret volumes use symlinks that swap atomically | Path walk follows the symlink to the current version |
| Nginx | Every HTTP request resolves a filesystem path | Deep document root nesting adds latency per request |
| Git | .git/objects uses 2-character directory fan-out to avoid million-entry directories | Fan-out keeps directory sizes manageable |
| Node.js | require() walks up the directory tree checking node_modules at each level | Deep project nesting = many directory lookups per require |
Stack layer mapping (slow file open):
| Layer | What to check | Tool |
|---|---|---|
| Application | How deep is the path? How many opens per second? | strace -c -e openat |
| Dentry cache | Are lookups hitting cache or going to disk? | slabtop, /proc/sys/fs/dentry-state |
| Filesystem | Directory size? Indexing enabled (htree for ext4)? | ls -la dir, tune2fs -l |
| Kernel | vfs_cache_pressure tuning? Memory pressure evicting dentries? | /proc/sys/vm/vfs_cache_pressure |
| Storage | Disk I/O for directory reads? | iostat -x |
Design Rationale A filename is a relationship -- "this name points to that file" -- not a property of the file itself. Storing names inside inodes would have killed hard links and forced every rename to rewrite data on disk, so directory entries were split out. The dentry cache followed because path resolution is the hottest code path in the VFS; hitting disk for each component of every open(), stat(), and exec() would be catastrophic at scale. RCU-walk came later still, once even the reference-counting overhead of traditional locking became a bottleneck on many-core machines -- a lockless fast path for cached, uncontended lookups brought per-lookup cost from hundreds of nanoseconds down to single digits.
If You See This, Think This
| Symptom | Likely cause | First check |
|---|---|---|
| ls is slow on a directory | Directory has millions of entries | du -s dir or count files |
| ELOOP error on open/stat | Symlink loop or too many levels of symlink indirection (max 40) | readlink -f to trace the chain |
| First request after reboot is 100x slower | Cold dentry cache, all path components loaded from disk | Expected behavior, warms up quickly |
| High dentry slab memory on host with many containers | Each container's filesystem adds dentries to the cache | slabtop, consider tuning vfs_cache_pressure |
| ENOENT despite file existing | Race condition (file deleted between readdir and open) or stale negative dentry | Check for concurrent writers |
| Application startup slow due to many config files | Deep path resolution for hundreds of config/library files | strace -e openat to count file opens |
When to Use / Avoid
Relevant when:
- Debugging slow file operations on first access (cold dentry cache)
- Understanding why ls on a directory with millions of files is slow
- Working with deeply nested paths that require many lookup steps
- Diagnosing high dentry slab memory usage on hosts with many files
Watch out for:
- Directories with millions of entries slow down even with htree indexing (ext4)
- symlink resolution adds extra path walks (limit: 40 symlink follows per resolution)
- mount points are crossed transparently during path walk (can surprise when debugging)
- Negative dentries (cached "file does not exist" results) also consume memory
Try It Yourself
1 # Show dentry cache stats: total allocated, unused, age_limit. first field reveals dcache size
2 cat /proc/sys/fs/dentry-state
3
4 # Show kernel slab caches sorted by size; dentry and inode_cache are typically the largest
5 slabtop -s c | head -5
6
7 # Trace openat() calls to see how ls resolves and reads directory entries
8 strace -e openat ls /tmp
9
10 # Query the maximum path length for a filesystem. typically 4096 bytes on Linux
11 getconf PATH_MAX /
12
13 # Show the current working directory of the shell process via its /proc symlink
14 ls -la /proc/$$/cwd
15
16 # List directory entries of /proc/self/fd. equivalent to readdir() on the virtual proc filesystem
17 python3 -c "import os; print(os.listdir('/proc/self/fd'))"Debug Checklist
- 1
Check dentry cache size: slabtop | grep dentry - 2
Check dentry cache stats: cat /proc/sys/fs/dentry-state - 3
Monitor path resolution cost: perf trace -e openat,stat -- ls /deep/path - 4
Check directory size: ls -la /path (large Size = many entries) - 5
Check filesystem type: df -T /path - 6
Check vfs_cache_pressure: cat /proc/sys/vm/vfs_cache_pressure (100 = default)
Key Takeaways
- ✓Hot path lookups are nearly free. RCU-walk resolves paths without taking any locks or dentry reference counts — it only falls back to ref-walk on cache misses, sleeping permission checks, or concurrent renames
- ✓openat(dirfd, "relative/path") eliminates an entire class of security bugs: it resolves paths from a pinned directory fd, so nobody can swap a directory out from under you between checking and using it (TOCTOU)
- ✓The cache remembers "not found" too. Negative dentries prevent repeated disk reads for names that don't exist — critical every time your shell searches $PATH or a compiler checks include directories
- ✓The dcache can quietly eat gigabytes of kernel memory (check /proc/sys/fs/dentry-state). The kernel's dcache shrinker reclaims unused entries under memory pressure via LRU eviction
- ✓ext4 doesn't scan directories linearly — it uses an HTree (hash-tree B-tree) for O(1) lookup by name. Without it, a directory with a million files would require scanning every entry for each lookup
Common Pitfalls
- ✗Using access() then open() as separate calls — this is a textbook TOCTOU vulnerability. Between the check and the use, an attacker can swap the file. Use openat() or just open() and check the return value
- ✗Passing a too-small buffer to getcwd() — it returns NULL with ERANGE. Use NULL as the buffer (GNU extension to let glibc allocate) or always allocate PATH_MAX bytes
- ✗Expecting readdir() to return files in alphabetical or creation order — it doesn't. ext4 HTree returns entries in hash order (looks random). If you need sorted output, sort it yourself
- ✗Not handling concurrent modifications during readdir() — on large directories with hash rebalancing, readdir may skip or duplicate entries. Don't assume a single pass sees a perfect snapshot
Reference
In One Line
Every open() and stat() walks directory entries one component at a time -- the dentry cache makes it fast, but million-entry directories and deep nesting still hurt.