File Systems & I/OTopic 10 of 19

Storage & FilesystemsIntermediate

tmpfs & ramfs -- In-Memory Filesystems

DockerKubernetesRedis

🧠

Mental Model

A whiteboard in a meeting room. Writing and erasing is instant -- no printers, no filing cabinets. The whiteboard has a fixed size (tmpfs with size= limit). If someone fills it up, the next person cannot write until space is cleared. When everyone leaves the room (reboot), the whiteboard is wiped clean. ramfs is the same whiteboard but with no boundary -- people keep taping on extra panels until the room is full and nobody can move.

💡

The Problem

A container running on a host with 4 GB of RAM reports ENOSPC writing to /tmp. The host disk has 200 GB free. The confusion: /tmp is not on disk. It is a tmpfs mount, and the default size is 50% of host RAM -- 2 GB. Three containers share that 2 GB ceiling. One container's log rotation dumps 1.5 GB of compressed archives to /tmp before moving them to S3. That single operation starves the other two containers of tmpfs space. Increasing host disk does nothing. The fix is either setting an explicit size limit per container tmpfs mount or moving the staging directory to a disk-backed path.

Architecture

A file written to /tmp inside a container disappears the moment the container stops. It never existed on disk. The filesystem backing that write was tmpfs -- a filesystem that stores data in RAM, enforced by a size limit, and reclaimed the instant the mount is destroyed.

tmpfs is everywhere in a modern Linux system. It backs /dev/shm for shared memory, /run for runtime state, /tmp in most containers, and the anonymous memory behind mmap(MAP_SHARED|MAP_ANONYMOUS). Understanding how it works -- and how it differs from ramfs -- is the difference between a well-tuned system and one that OOMs under load.

How tmpfs Works

tmpfs is implemented in mm/shmem.c and integrates directly with the kernel's page cache and swap subsystem. Here is the lifecycle of a file on tmpfs:

A process calls open("/tmp/data", O_CREAT|O_RDWR). The VFS routes the call to shmem_create().
An inode is allocated from the tmpfs superblock. No disk block is reserved.
The process calls write(). The shmem_write_begin() function allocates a page from the page cache.
Data is copied from userspace into the page. The page exists only in RAM.
Under memory pressure, the kernel can swap tmpfs pages to the swap device, just like anonymous memory pages.
When the file is deleted or the mount is unmounted, pages are freed immediately. No disk blocks to deallocate, no journal to update.

The key insight: tmpfs pages are demand-allocated. Mounting a 10 GB tmpfs consumes zero memory. Only actual file writes allocate pages. An empty tmpfs mount costs nothing except a superblock and root inode.

tmpfs vs ramfs

These two filesystems look identical from userspace but have fundamentally different resource management:

Feature            tmpfs                              ramfs
-----------        ----------------------------       --------------------------
Size limit         Yes (size= mount option)           No (grows until OOM)
Swap support       Yes (pages can be swapped)         No (pages pinned in RAM)
Accounting         Yes (shows in df, /proc/meminfo)   No (invisible to df)
Default size       50% of RAM                         Unlimited
Production use     /dev/shm, /run, /tmp               Almost never
Implementation     mm/shmem.c (complex)               fs/ramfs/ (trivial)

ramfs was the original RAM-backed filesystem in Linux. It served as a proof of concept: any page cache-backed filesystem with no writeback path automatically keeps data in RAM. But without size enforcement or swap support, ramfs is dangerous for any user-writable mount. tmpfs replaced it for every production use case.

The Mount Options That Matter

mount -t tmpfs -o size=1G,nr_inodes=10000,mode=1777,noexec,nosuid tmpfs /mnt/scratch

size=1G: Maximum total file data. Can be specified as percentage (size=50%). Default is 50% of RAM. This is a ceiling, not a reservation.
nr_inodes=10000: Maximum number of files and directories. Default scales with RAM. Set this on mounts exposed to untrusted code to prevent inode exhaustion attacks.
mode=1777: The sticky bit, matching /tmp semantics. All users can create files, but only owners can delete them.
noexec: Prevents execution of binaries on the mount. Critical for /tmp on security-hardened systems.
nosuid: Ignores setuid/setgid bits. Prevents privilege escalation via files staged in tmpfs.
huge=within_size: Enables transparent huge page support for tmpfs. Reduces TLB misses for large shared memory segments.

Live resizing without unmount:

# Expand /dev/shm to 2 GB while it is in use
mount -o remount,size=2G /dev/shm

This is non-disruptive. Existing files and mappings are unaffected. Shrinking below current usage fails with EBUSY.

/dev/shm and POSIX Shared Memory

/dev/shm is a tmpfs mount that provides the backing store for POSIX shared memory. When a process calls shm_open("/my_buffer", O_CREAT, 0600), the C library creates a file at /dev/shm/my_buffer. The process then calls ftruncate() to set the size and mmap() to map it into its address space.

A second process opens the same name with shm_open() and maps it. Both processes now have virtual addresses pointing to the same physical pages -- zero-copy, zero-syscall data sharing on the data path.

# Check /dev/shm usage
df -h /dev/shm

# List active shared memory segments
ls -la /dev/shm/

# Check the Docker default (64 MB)
docker inspect --format '{{.HostConfig.ShmSize}}' my_container

The 64 MB default in Docker is the single most common cause of shared memory failures in containers. PostgreSQL, Oracle, MATLAB, and MPI-based applications all require larger segments.

/run and Runtime State

systemd mounts /run as tmpfs early in boot, before the root filesystem is mounted read-write. This solves the stale PID file problem: if a daemon crashes and leaves /var/run/sshd.pid behind, the next boot would see the file and think sshd is already running. With /run on tmpfs, every boot starts clean.

# Typical /run contents
ls /run/
# Output: lock/ systemd/ user/ sshd.pid dbus/ ...

# Check /run mount and size
findmnt /run
# TARGET  SOURCE  FSTYPE  OPTIONS
# /run    tmpfs   tmpfs   rw,nosuid,nodev,noexec,size=1612860k,mode=755

Containers and tmpfs

Container runtimes create tmpfs mounts inside the container's mount namespace for isolation and performance:

Docker:

# Custom tmpfs at /tmp with size limit and security options
docker run --tmpfs /tmp:rw,noexec,nosuid,size=256m alpine sh

# Override /dev/shm size for database containers
docker run --shm-size=1g postgres:16

Kubernetes:

volumes:
  - name: scratch
    emptyDir:
      medium: Memory     # Creates a tmpfs-backed volume
      sizeLimit: 512Mi   # Enforced by kubelet eviction

Without sizeLimit, a Kubernetes emptyDir with medium: Memory defaults to 50% of node RAM. A pod writing unbounded data to this volume can trigger node-level memory pressure and affect every pod on the node.

Performance Characteristics

tmpfs eliminates the entire block I/O stack. A write() to tmpfs copies data from userspace to a page cache page. There is no block layer, no I/O scheduler, no device driver, no disk seek, no write-ahead log. The latency profile:

Operation       tmpfs         ext4 (SSD)     ext4 (HDD)
----------      ---------     -----------    -----------
4 KB write      0.5-2 us      10-50 us       2-10 ms
4 KB read       0.3-1 us      5-20 us        5-15 ms
fsync           no-op          50-500 us      5-30 ms
Sequential R/W  ~50 GB/s      500 MB-3 GB/s  100-200 MB/s

fsync() on tmpfs is a no-op because there is no durable storage to flush to. This makes tmpfs unsuitable for any data that must survive power loss, but ideal for scratch data where fsync overhead is pure waste.

Under the Hood

Page allocation in shmem.c. When a process writes to a tmpfs file, shmem_getpage_gfp() allocates a page. It first checks the swap cache (the page might have been swapped out earlier), then the page cache (another process might already have the page mapped), and finally allocates a fresh page. This three-level lookup is why tmpfs integrates with both the page cache and swap subsystem.

Swap interaction. tmpfs pages are added to the swap LRU lists alongside anonymous pages. Under memory pressure, kswapd treats them identically: it selects cold tmpfs pages and writes them to the swap device. This is why tmpfs data survives memory pressure (at the cost of swap I/O latency) while ramfs data cannot be evicted at all. Setting vm.swappiness affects how aggressively tmpfs pages are swapped relative to file-backed page cache pages.

Huge page support. Since kernel 4.7, tmpfs supports transparent huge pages via the huge= mount option. For a 1 GB shared memory segment, using 2 MB huge pages reduces TLB entries from 262,144 to 512, significantly reducing address translation overhead for database buffer pools and scientific computing workloads.

Accounting and limits. tmpfs tracks usage through the shmem_inode_info structure attached to each inode. The superblock tracks total blocks and inodes against the configured limits. This accounting is what makes df work on tmpfs and what enforces ENOSPC when the size limit is hit -- features that ramfs lacks entirely.

Common Questions

Can tmpfs data survive a reboot?

No. tmpfs exists in volatile memory (RAM and swap). Reboot deallocates all pages. There is no journal, no superblock on disk, no recovery mechanism. If data must survive restarts, it belongs on a persistent filesystem. For containers, this means any state written to a tmpfs-backed emptyDir is lost when the pod is rescheduled.

What happens when tmpfs runs out of space?

write() returns -1 with errno set to ENOSPC, exactly like a full disk filesystem. The process receives the same error it would get on ext4 or XFS. This is important: tmpfs space exhaustion looks identical to disk exhaustion from the application's perspective. Monitoring must check df on tmpfs mounts, not just physical disks.

Is tmpfs faster than a RAM disk (/dev/ram0)?

For most workloads, yes. A RAM disk allocates a fixed block of memory at creation (wasting RAM when underutilized) and still goes through the block I/O layer (adding overhead for request queuing and scheduling). tmpfs bypasses the block layer entirely and allocates pages on demand. The only advantage of a RAM disk is that it presents a block device, which some tools require.

How does tmpfs interact with cgroups memory limits?

tmpfs pages allocated by processes in a cgroup are charged to that cgroup's memory limit (since kernel 4.0 with memory.use_hierarchy). A container with a 1 GB memory limit that writes 800 MB to tmpfs has only 200 MB left for heap, stack, and page cache. This interaction catches operators who size container memory limits based on application RSS alone, forgetting that tmpfs writes also count against the limit.

How Technologies Use This

Docker

A microservice container runs an image processing pipeline that writes 200 MB of temporary files per request to /tmp. On the default overlay2 filesystem, each write hits the node's SSD, adding 3 ms of latency per file operation and wearing the disk with ephemeral data that is deleted seconds later. Running 40 such containers on a single node produces 8 GB/s of unnecessary disk I/O.

Docker supports the --tmpfs flag (e.g., --tmpfs /tmp:size=64m) to mount a tmpfs filesystem inside the container at /tmp. The container runtime calls mount("tmpfs", "/tmp", "tmpfs", "size=67108864") within the container's mount namespace. All writes to /tmp land in RAM-backed pages managed by the kernel page cache, isolated from other containers by namespace boundaries. Reads happen at memory speed, and the kernel reclaims every page instantly when the container stops, leaving no residual data on the node filesystem.

The size parameter acts as a hard ceiling. When a container tries to write beyond 64 MB, the write syscall returns ENOSPC rather than silently consuming node memory. Without an explicit size limit, tmpfs defaults to 50% of host RAM. On a node with 8 GB total, a single runaway container writing to an unbounded tmpfs can allocate 4 GB before the OOM killer intervenes and potentially terminates unrelated containers sharing the same node.

Kubernetes

A Kubernetes pod running a data transformation job needs 2 GB of scratch space for intermediate Parquet files that exist for 30 seconds between pipeline stages. Writing these files to the pod's ephemeral storage on a network-attached EBS volume adds 15 ms per read and contends with other pods for disk IOPS. The job processes 500 batches per hour, and the cumulative I/O wait adds 45 minutes of overhead per day.

Setting the pod's emptyDir volume to medium: Memory tells the kubelet to mount a tmpfs filesystem into the pod. The YAML spec includes a sizeLimit field (e.g., sizeLimit: 2Gi) that the kubelet enforces through periodic polling of the mount's usage. Intermediate files written to this volume reside entirely in RAM pages, accessible at memory bandwidth (tens of GB/s) instead of network storage bandwidth (hundreds of MB/s). When the pod terminates, the kubelet unmounts the tmpfs and the kernel frees all associated pages immediately.

If a pod exceeds its sizeLimit, the kubelet evicts the pod rather than allowing it to consume unbounded node memory. This eviction behavior differs from a container hitting its memory limit: the OOM killer is not involved, and the pod transitions to a Failed state with an explicit eviction reason. Cluster operators pair the sizeLimit with resource requests on the pod to ensure the node scheduler accounts for the RAM consumed by tmpfs mounts in its capacity calculations.

Redis

A Redis instance running inside a Docker container shares monitoring data with a Prometheus sidecar container in the same pod. The sidecar reads Redis metrics 10 times per second, and both processes exchange approximately 50,000 small messages per second. Using a Unix domain socket for this communication adds 5 to 10 microseconds of latency per message due to two kernel buffer copies (userspace to kernel on write, kernel to userspace on read).

Both containers access a shared /dev/shm tmpfs mount backed by RAM. Redis writes metric snapshots to a memory-mapped file on /dev/shm using shm_open() and mmap(). The sidecar maps the same file into its address space. Because both mappings point to identical physical pages in the tmpfs filesystem, writes from Redis are visible to the sidecar without any system call on the data path and without copying data through kernel buffers. Latency for a single metric read drops to 50 to 200 nanoseconds when combined with atomic signaling for synchronization.

Docker mounts /dev/shm with a default size of 64 MB per container. If the shared memory segment needs to exceed that limit, the container must be started with --shm-size set to a higher value (e.g., --shm-size=256m), or in Kubernetes, a memory-backed emptyDir volume can be mounted at /dev/shm in the pod spec. Applications that call shm_open() and attempt to ftruncate() beyond the 64 MB default receive ENOSPC and typically crash or fall back to degraded operation without clear error messages.

Same Concept Across Tech

Technology	How it uses tmpfs	Key gotcha
Docker	Mounts tmpfs at /dev/shm (64 MB default), supports --tmpfs flag for custom mounts	Default /dev/shm is too small for PostgreSQL, Oracle, or any app using large shared memory segments
Kubernetes	emptyDir with medium: Memory creates tmpfs-backed volumes	Without sizeLimit, a pod can consume 50% of node RAM via tmpfs and trigger node-level OOM
PostgreSQL	Uses /dev/shm for POSIX shared memory (dynamic_shared_memory_type = posix)	Container /dev/shm must be >= shared_buffers or the database fails to start
systemd	Mounts /run as tmpfs for PID files, sockets, and runtime state	Applications that write large files to /run (core dumps, journals) can exhaust the mount
Build tools	Bazel, Make, Cargo use tmpfs for output directories to avoid disk I/O	If the build working set exceeds tmpfs size, builds fail with ENOSPC mid-compilation

Stack layer mapping (container tmpfs space exhaustion):

Layer	What to check	Tool
Application	Which files consume the most space in /tmp?	du -sh /tmp/* inside the container
Container	What size= was set on the tmpfs mount?	docker inspect or findmnt inside container
Runtime	Is the container sharing the host tmpfs or using its own?	findmnt -t tmpfs from host and container
Host	How much physical RAM is available for tmpfs pages?	free -h and grep Shmem /proc/meminfo
Kernel	Are tmpfs pages being swapped under pressure?	vmstat 1 (check si/so columns)

Design Rationale Unix needed a fast, standards-compliant filesystem for temporary data that does not outlive a session. Early approaches used RAM disks (fixed allocation, wasteful) or just wrote to /tmp on disk (slow, required cleanup). tmpfs solved both problems: allocate pages on demand so empty mounts cost nothing, enforce a size ceiling so runaway writes cannot consume all memory, and support swap so data survives memory pressure instead of causing OOM. The combination of demand allocation, size enforcement, and swap integration made tmpfs the universal choice for ephemeral storage across containers, init systems, and shared memory.

If You See This, Think This

Symptom	Likely cause	First check
ENOSPC on /tmp but disk has space	/tmp is tmpfs and has hit its size limit	df -h /tmp (check if it is tmpfs and near 100%)
Container fails to start with shared memory error	/dev/shm too small (Docker default is 64 MB)	docker inspect for ShmSize or check --shm-size
Node OOM after pod scheduling	tmpfs emptyDir without sizeLimit consuming host RAM	kubectl describe node, check for memory pressure; findmnt -t tmpfs
/run fills up on long-running server	Services writing large files to /run without cleanup	du -sh /run/* to find culprits
Application slower than expected on tmpfs	tmpfs pages swapped to disk under memory pressure	vmstat 1 (si/so columns), free -h to check available RAM
Permission denied writing to tmpfs /tmp	Mount options include noexec or wrong mode	mount

When to Use / Avoid

Relevant when:

Providing fast, ephemeral scratch space for containers, build systems, or tests
Setting up POSIX shared memory between processes (/dev/shm)
Storing runtime state that must not survive reboot (/run, PID files, socket files)
Eliminating disk I/O for intermediate build artifacts or test data

Watch out for:

tmpfs without size= defaults to 50% of RAM. On memory-constrained hosts this is dangerous
Docker /dev/shm defaults to 64 MB. Database containers and scientific workloads need more
ramfs has no size limit and no swap. Never use it for user-writable data in production
tmpfs data disappears on reboot. Mission-critical data belongs on persistent storage

Try It Yourself

 1  # Create a 512 MB tmpfs mount with /tmp-like permissions
 2  
 3  sudo mount -t tmpfs -o size=512m,mode=1777 tmpfs /mnt/fast
 4  
 5  # List all tmpfs mounts with options and sizes
 6  
 7  findmnt -t tmpfs -o TARGET,SOURCE,FSTYPE,SIZE,OPTIONS
 8  
 9  # Check current tmpfs usage across all mounts
10  
11  df -h -t tmpfs
12  
13  # Resize /dev/shm without unmounting (live remount)
14  
15  sudo mount -o remount,size=2G /dev/shm
16  
17  # Check how much memory tmpfs is consuming system-wide
18  
19  grep Shmem /proc/meminfo
20  
21  # Run a Docker container with 256 MB tmpfs at /tmp
22  
23  docker run --tmpfs /tmp:rw,noexec,nosuid,size=256m alpine df -h /tmp
24  
25  # Verify the filesystem type of /tmp (0x01021994 = TMPFS_MAGIC)
26  
27  stat -f -c "%T" /tmp
28  
29  # Create a POSIX shared memory segment from the command line
30  
31  dd if=/dev/zero of=/dev/shm/test_segment bs=1M count=10 && ls -lh /dev/shm/test_segment
32  
33  # Check Docker default shm-size for a running container
34  
35  docker inspect --format '{{.HostConfig.ShmSize}}' <container_id>

Debug Checklist

1Check all tmpfs mounts and their sizes: findmnt -t tmpfs -o TARGET,SOURCE,SIZE,OPTIONS
2Check tmpfs usage: df -h /dev/shm /run /tmp
3Check if /tmp is tmpfs or disk-backed: stat -f /tmp (type 0x01021994 = tmpfs)
4List POSIX shared memory segments: ls -la /dev/shm/
5Check container /dev/shm size: docker inspect <container> | grep ShmSize
6Check swap usage by tmpfs: swapon --show && grep Shmem /proc/meminfo
7Verify mount options (noexec, nosuid, size): mount | grep tmpfs

Key Takeaways

✓tmpfs is not a RAM disk. A RAM disk (like /dev/ram0) allocates a fixed block of memory at creation. tmpfs allocates pages on demand and frees them when files are deleted. An empty tmpfs uses zero RAM. A 10 GB tmpfs mount with only 50 MB of files in it uses only 50 MB of physical memory.
✓tmpfs pages can be swapped out. Under memory pressure, the kernel treats tmpfs pages like any other anonymous page and moves them to swap. This means tmpfs data survives memory pressure (it just gets slower), while ramfs data can never be evicted and will cause OOM conditions instead.
✓The size= mount option limits total file data, not resident memory. Setting size=1G means up to 1 GB of file content can exist in the filesystem. If memory is tight, some of those pages live in swap. If nr_inodes= is not set, the default allows approximately 50% of RAM worth of inodes.
✓Container runtimes mount tmpfs for /tmp, /run, and /dev/shm inside each container. These are independent tmpfs instances in the container's mount namespace. The size= parameter on each mount is critical -- without it, a single container can consume half of host RAM through tmpfs writes alone.
✓tmpfs supports huge pages via the huge= mount option. Setting huge=within_size or huge=always allows tmpfs to use 2 MB huge pages for large files, reducing TLB pressure during sequential access. This matters for shared memory segments used by databases.

Common Pitfalls

✗Assuming tmpfs data survives a reboot. tmpfs lives in volatile memory (RAM plus swap). Power loss or reboot destroys everything. Data that must survive restarts belongs on a persistent filesystem. This bites container deployments where tmpfs-backed volumes silently lose state during pod rescheduling.
✗Using ramfs in production instead of tmpfs. ramfs has no size limit. A process that writes continuously to a ramfs mount will consume all system memory because ramfs pages cannot be evicted or reclaimed. The OOM killer is the only backstop. tmpfs with an explicit size= limit prevents this.
✗Not setting --shm-size in Docker or a memory-backed emptyDir in Kubernetes for applications that use POSIX shared memory. The default /dev/shm in Docker is 64 MB. PostgreSQL with shared_buffers=256MB will fail to start. Oracle databases, MATLAB, and many MPI-based scientific tools also require larger /dev/shm.
✗Confusing tmpfs size with memory reservation. A tmpfs mounted with size=4G does not reserve 4 GB of RAM. It sets a ceiling. Actual memory use depends on files written. But if processes fill it to 4 GB and memory is scarce, those pages compete with application memory for physical frames, and the result is swap thrashing or OOM kills.
✗Mounting tmpfs without noexec,nosuid when used for temporary data. On security-sensitive systems, tmpfs mounts at /tmp should include noexec and nosuid options to prevent execution of uploaded binaries and privilege escalation through setuid files staged in /tmp.

Reference

System Calls

mountumount2shm_openshm_unlinkmmapftruncatestatfs

Tools

mount -t tmpfsdf -h /dev/shmfindmnt -t tmpfsmount -o remount,size=1G /dev/shmfree -hipcs -m

📌

In One Line

tmpfs turns RAM into a filesystem with a size limit, swap support, and instant cleanup on delete -- the reason /dev/shm, /run, and every container /tmp mount exists.

tmpfs & ramfs -- In-Memory Filesystems

DockerKubernetesRedis

🧠

Mental Model

💡

The Problem

Architecture

How tmpfs Works

tmpfs is implemented in mm/shmem.c and integrates directly with the kernel's page cache and swap subsystem. Here is the lifecycle of a file on tmpfs:

A process calls open("/tmp/data", O_CREAT|O_RDWR). The VFS routes the call to shmem_create().
An inode is allocated from the tmpfs superblock. No disk block is reserved.
The process calls write(). The shmem_write_begin() function allocates a page from the page cache.
Data is copied from userspace into the page. The page exists only in RAM.
Under memory pressure, the kernel can swap tmpfs pages to the swap device, just like anonymous memory pages.
When the file is deleted or the mount is unmounted, pages are freed immediately. No disk blocks to deallocate, no journal to update.

tmpfs vs ramfs

These two filesystems look identical from userspace but have fundamentally different resource management:

Feature            tmpfs                              ramfs
-----------        ----------------------------       --------------------------
Size limit         Yes (size= mount option)           No (grows until OOM)
Swap support       Yes (pages can be swapped)         No (pages pinned in RAM)
Accounting         Yes (shows in df, /proc/meminfo)   No (invisible to df)
Default size       50% of RAM                         Unlimited
Production use     /dev/shm, /run, /tmp               Almost never
Implementation     mm/shmem.c (complex)               fs/ramfs/ (trivial)

The Mount Options That Matter

mount -t tmpfs -o size=1G,nr_inodes=10000,mode=1777,noexec,nosuid tmpfs /mnt/scratch

size=1G: Maximum total file data. Can be specified as percentage (size=50%). Default is 50% of RAM. This is a ceiling, not a reservation.
nr_inodes=10000: Maximum number of files and directories. Default scales with RAM. Set this on mounts exposed to untrusted code to prevent inode exhaustion attacks.
mode=1777: The sticky bit, matching /tmp semantics. All users can create files, but only owners can delete them.
noexec: Prevents execution of binaries on the mount. Critical for /tmp on security-hardened systems.
nosuid: Ignores setuid/setgid bits. Prevents privilege escalation via files staged in tmpfs.
huge=within_size: Enables transparent huge page support for tmpfs. Reduces TLB misses for large shared memory segments.

Live resizing without unmount:

# Expand /dev/shm to 2 GB while it is in use
mount -o remount,size=2G /dev/shm

This is non-disruptive. Existing files and mappings are unaffected. Shrinking below current usage fails with EBUSY.

/dev/shm and POSIX Shared Memory

# Check /dev/shm usage
df -h /dev/shm

# List active shared memory segments
ls -la /dev/shm/

# Check the Docker default (64 MB)
docker inspect --format '{{.HostConfig.ShmSize}}' my_container

The 64 MB default in Docker is the single most common cause of shared memory failures in containers. PostgreSQL, Oracle, MATLAB, and MPI-based applications all require larger segments.

/run and Runtime State

# Typical /run contents
ls /run/
# Output: lock/ systemd/ user/ sshd.pid dbus/ ...

# Check /run mount and size
findmnt /run
# TARGET  SOURCE  FSTYPE  OPTIONS
# /run    tmpfs   tmpfs   rw,nosuid,nodev,noexec,size=1612860k,mode=755

Containers and tmpfs

Container runtimes create tmpfs mounts inside the container's mount namespace for isolation and performance:

Docker:

# Custom tmpfs at /tmp with size limit and security options
docker run --tmpfs /tmp:rw,noexec,nosuid,size=256m alpine sh

# Override /dev/shm size for database containers
docker run --shm-size=1g postgres:16

Kubernetes:

volumes:
  - name: scratch
    emptyDir:
      medium: Memory     # Creates a tmpfs-backed volume
      sizeLimit: 512Mi   # Enforced by kubelet eviction

Performance Characteristics

Operation       tmpfs         ext4 (SSD)     ext4 (HDD)
----------      ---------     -----------    -----------
4 KB write      0.5-2 us      10-50 us       2-10 ms
4 KB read       0.3-1 us      5-20 us        5-15 ms
fsync           no-op          50-500 us      5-30 ms
Sequential R/W  ~50 GB/s      500 MB-3 GB/s  100-200 MB/s

Under the Hood

Common Questions

Can tmpfs data survive a reboot?

What happens when tmpfs runs out of space?

Is tmpfs faster than a RAM disk (/dev/ram0)?

How does tmpfs interact with cgroups memory limits?

How Technologies Use This

Docker

Kubernetes

Redis

Same Concept Across Tech

Technology	How it uses tmpfs	Key gotcha
Docker	Mounts tmpfs at /dev/shm (64 MB default), supports --tmpfs flag for custom mounts	Default /dev/shm is too small for PostgreSQL, Oracle, or any app using large shared memory segments
Kubernetes	emptyDir with medium: Memory creates tmpfs-backed volumes	Without sizeLimit, a pod can consume 50% of node RAM via tmpfs and trigger node-level OOM
PostgreSQL	Uses /dev/shm for POSIX shared memory (dynamic_shared_memory_type = posix)	Container /dev/shm must be >= shared_buffers or the database fails to start
systemd	Mounts /run as tmpfs for PID files, sockets, and runtime state	Applications that write large files to /run (core dumps, journals) can exhaust the mount
Build tools	Bazel, Make, Cargo use tmpfs for output directories to avoid disk I/O	If the build working set exceeds tmpfs size, builds fail with ENOSPC mid-compilation

Stack layer mapping (container tmpfs space exhaustion):

Layer	What to check	Tool
Application	Which files consume the most space in /tmp?	du -sh /tmp/* inside the container
Container	What size= was set on the tmpfs mount?	docker inspect or findmnt inside container
Runtime	Is the container sharing the host tmpfs or using its own?	findmnt -t tmpfs from host and container
Host	How much physical RAM is available for tmpfs pages?	free -h and grep Shmem /proc/meminfo
Kernel	Are tmpfs pages being swapped under pressure?	vmstat 1 (check si/so columns)

If You See This, Think This

Symptom	Likely cause	First check
ENOSPC on /tmp but disk has space	/tmp is tmpfs and has hit its size limit	df -h /tmp (check if it is tmpfs and near 100%)
Container fails to start with shared memory error	/dev/shm too small (Docker default is 64 MB)	docker inspect for ShmSize or check --shm-size
Node OOM after pod scheduling	tmpfs emptyDir without sizeLimit consuming host RAM	kubectl describe node, check for memory pressure; findmnt -t tmpfs
/run fills up on long-running server	Services writing large files to /run without cleanup	du -sh /run/* to find culprits
Application slower than expected on tmpfs	tmpfs pages swapped to disk under memory pressure	vmstat 1 (si/so columns), free -h to check available RAM
Permission denied writing to tmpfs /tmp	Mount options include noexec or wrong mode	mount

When to Use / Avoid

Relevant when:

Providing fast, ephemeral scratch space for containers, build systems, or tests
Setting up POSIX shared memory between processes (/dev/shm)
Storing runtime state that must not survive reboot (/run, PID files, socket files)
Eliminating disk I/O for intermediate build artifacts or test data

Watch out for:

tmpfs without size= defaults to 50% of RAM. On memory-constrained hosts this is dangerous
Docker /dev/shm defaults to 64 MB. Database containers and scientific workloads need more
ramfs has no size limit and no swap. Never use it for user-writable data in production
tmpfs data disappears on reboot. Mission-critical data belongs on persistent storage

Try It Yourself

 1  # Create a 512 MB tmpfs mount with /tmp-like permissions
 2  
 3  sudo mount -t tmpfs -o size=512m,mode=1777 tmpfs /mnt/fast
 4  
 5  # List all tmpfs mounts with options and sizes
 6  
 7  findmnt -t tmpfs -o TARGET,SOURCE,FSTYPE,SIZE,OPTIONS
 8  
 9  # Check current tmpfs usage across all mounts
10  
11  df -h -t tmpfs
12  
13  # Resize /dev/shm without unmounting (live remount)
14  
15  sudo mount -o remount,size=2G /dev/shm
16  
17  # Check how much memory tmpfs is consuming system-wide
18  
19  grep Shmem /proc/meminfo
20  
21  # Run a Docker container with 256 MB tmpfs at /tmp
22  
23  docker run --tmpfs /tmp:rw,noexec,nosuid,size=256m alpine df -h /tmp
24  
25  # Verify the filesystem type of /tmp (0x01021994 = TMPFS_MAGIC)
26  
27  stat -f -c "%T" /tmp
28  
29  # Create a POSIX shared memory segment from the command line
30  
31  dd if=/dev/zero of=/dev/shm/test_segment bs=1M count=10 && ls -lh /dev/shm/test_segment
32  
33  # Check Docker default shm-size for a running container
34  
35  docker inspect --format '{{.HostConfig.ShmSize}}' <container_id>

Debug Checklist

1Check all tmpfs mounts and their sizes: findmnt -t tmpfs -o TARGET,SOURCE,SIZE,OPTIONS
2Check tmpfs usage: df -h /dev/shm /run /tmp
3Check if /tmp is tmpfs or disk-backed: stat -f /tmp (type 0x01021994 = tmpfs)
4List POSIX shared memory segments: ls -la /dev/shm/
5Check container /dev/shm size: docker inspect <container> | grep ShmSize
6Check swap usage by tmpfs: swapon --show && grep Shmem /proc/meminfo
7Verify mount options (noexec, nosuid, size): mount | grep tmpfs

Key Takeaways

✓tmpfs is not a RAM disk. A RAM disk (like /dev/ram0) allocates a fixed block of memory at creation. tmpfs allocates pages on demand and frees them when files are deleted. An empty tmpfs uses zero RAM. A 10 GB tmpfs mount with only 50 MB of files in it uses only 50 MB of physical memory.
✓tmpfs pages can be swapped out. Under memory pressure, the kernel treats tmpfs pages like any other anonymous page and moves them to swap. This means tmpfs data survives memory pressure (it just gets slower), while ramfs data can never be evicted and will cause OOM conditions instead.
✓The size= mount option limits total file data, not resident memory. Setting size=1G means up to 1 GB of file content can exist in the filesystem. If memory is tight, some of those pages live in swap. If nr_inodes= is not set, the default allows approximately 50% of RAM worth of inodes.
✓Container runtimes mount tmpfs for /tmp, /run, and /dev/shm inside each container. These are independent tmpfs instances in the container's mount namespace. The size= parameter on each mount is critical -- without it, a single container can consume half of host RAM through tmpfs writes alone.
✓tmpfs supports huge pages via the huge= mount option. Setting huge=within_size or huge=always allows tmpfs to use 2 MB huge pages for large files, reducing TLB pressure during sequential access. This matters for shared memory segments used by databases.

Common Pitfalls

✗Assuming tmpfs data survives a reboot. tmpfs lives in volatile memory (RAM plus swap). Power loss or reboot destroys everything. Data that must survive restarts belongs on a persistent filesystem. This bites container deployments where tmpfs-backed volumes silently lose state during pod rescheduling.
✗Using ramfs in production instead of tmpfs. ramfs has no size limit. A process that writes continuously to a ramfs mount will consume all system memory because ramfs pages cannot be evicted or reclaimed. The OOM killer is the only backstop. tmpfs with an explicit size= limit prevents this.
✗Not setting --shm-size in Docker or a memory-backed emptyDir in Kubernetes for applications that use POSIX shared memory. The default /dev/shm in Docker is 64 MB. PostgreSQL with shared_buffers=256MB will fail to start. Oracle databases, MATLAB, and many MPI-based scientific tools also require larger /dev/shm.
✗Confusing tmpfs size with memory reservation. A tmpfs mounted with size=4G does not reserve 4 GB of RAM. It sets a ceiling. Actual memory use depends on files written. But if processes fill it to 4 GB and memory is scarce, those pages compete with application memory for physical frames, and the result is swap thrashing or OOM kills.
✗Mounting tmpfs without noexec,nosuid when used for temporary data. On security-sensitive systems, tmpfs mounts at /tmp should include noexec and nosuid options to prevent execution of uploaded binaries and privilege escalation through setuid files staged in /tmp.

Reference

System Calls

mountumount2shm_openshm_unlinkmmapftruncatestatfs

Tools

mount -t tmpfsdf -h /dev/shmfindmnt -t tmpfsmount -o remount,size=1G /dev/shmfree -hipcs -m

📌

In One Line

tmpfs turns RAM into a filesystem with a size limit, swap support, and instant cleanup on delete -- the reason /dev/shm, /run, and every container /tmp mount exists.

Mental Model

The Problem

Architecture

How tmpfs Works

tmpfs vs ramfs

The Mount Options That Matter

/dev/shm and POSIX Shared Memory

/run and Runtime State

Containers and tmpfs

Performance Characteristics

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics

Mental Model

The Problem

Architecture

How tmpfs Works

tmpfs vs ramfs

The Mount Options That Matter

/dev/shm and POSIX Shared Memory

/run and Runtime State

Containers and tmpfs

Performance Characteristics

Under the Hood

Common Questions

How Technologies Use This

Same Concept Across Tech

If You See This, Think This

When to Use / Avoid

Try It Yourself

Debug Checklist

Key Takeaways

Common Pitfalls

Reference

In One Line

Related Topics