tmpfs & ramfs -- In-Memory Filesystems
Mental Model
A whiteboard in a meeting room. Writing and erasing is instant -- no printers, no filing cabinets. The whiteboard has a fixed size (tmpfs with size= limit). If someone fills it up, the next person cannot write until space is cleared. When everyone leaves the room (reboot), the whiteboard is wiped clean. ramfs is the same whiteboard but with no boundary -- people keep taping on extra panels until the room is full and nobody can move.
The Problem
A container running on a host with 4 GB of RAM reports ENOSPC writing to /tmp. The host disk has 200 GB free. The confusion: /tmp is not on disk. It is a tmpfs mount, and the default size is 50% of host RAM -- 2 GB. Three containers share that 2 GB ceiling. One container's log rotation dumps 1.5 GB of compressed archives to /tmp before moving them to S3. That single operation starves the other two containers of tmpfs space. Increasing host disk does nothing. The fix is either setting an explicit size limit per container tmpfs mount or moving the staging directory to a disk-backed path.
Architecture
A file written to /tmp inside a container disappears the moment the container stops. It never existed on disk. The filesystem backing that write was tmpfs -- a filesystem that stores data in RAM, enforced by a size limit, and reclaimed the instant the mount is destroyed.
tmpfs is everywhere in a modern Linux system. It backs /dev/shm for shared memory, /run for runtime state, /tmp in most containers, and the anonymous memory behind mmap(MAP_SHARED|MAP_ANONYMOUS). Understanding how it works -- and how it differs from ramfs -- is the difference between a well-tuned system and one that OOMs under load.
How tmpfs Works
tmpfs is implemented in mm/shmem.c and integrates directly with the kernel's page cache and swap subsystem. Here is the lifecycle of a file on tmpfs:
- A process calls
open("/tmp/data", O_CREAT|O_RDWR). The VFS routes the call toshmem_create(). - An inode is allocated from the tmpfs superblock. No disk block is reserved.
- The process calls
write(). Theshmem_write_begin()function allocates a page from the page cache. - Data is copied from userspace into the page. The page exists only in RAM.
- Under memory pressure, the kernel can swap tmpfs pages to the swap device, just like anonymous memory pages.
- When the file is deleted or the mount is unmounted, pages are freed immediately. No disk blocks to deallocate, no journal to update.
The key insight: tmpfs pages are demand-allocated. Mounting a 10 GB tmpfs consumes zero memory. Only actual file writes allocate pages. An empty tmpfs mount costs nothing except a superblock and root inode.
tmpfs vs ramfs
These two filesystems look identical from userspace but have fundamentally different resource management:
Feature tmpfs ramfs
----------- ---------------------------- --------------------------
Size limit Yes (size= mount option) No (grows until OOM)
Swap support Yes (pages can be swapped) No (pages pinned in RAM)
Accounting Yes (shows in df, /proc/meminfo) No (invisible to df)
Default size 50% of RAM Unlimited
Production use /dev/shm, /run, /tmp Almost never
Implementation mm/shmem.c (complex) fs/ramfs/ (trivial)
ramfs was the original RAM-backed filesystem in Linux. It served as a proof of concept: any page cache-backed filesystem with no writeback path automatically keeps data in RAM. But without size enforcement or swap support, ramfs is dangerous for any user-writable mount. tmpfs replaced it for every production use case.
The Mount Options That Matter
mount -t tmpfs -o size=1G,nr_inodes=10000,mode=1777,noexec,nosuid tmpfs /mnt/scratch
- size=1G: Maximum total file data. Can be specified as percentage (size=50%). Default is 50% of RAM. This is a ceiling, not a reservation.
- nr_inodes=10000: Maximum number of files and directories. Default scales with RAM. Set this on mounts exposed to untrusted code to prevent inode exhaustion attacks.
- mode=1777: The sticky bit, matching
/tmpsemantics. All users can create files, but only owners can delete them. - noexec: Prevents execution of binaries on the mount. Critical for
/tmpon security-hardened systems. - nosuid: Ignores setuid/setgid bits. Prevents privilege escalation via files staged in tmpfs.
- huge=within_size: Enables transparent huge page support for tmpfs. Reduces TLB misses for large shared memory segments.
Live resizing without unmount:
# Expand /dev/shm to 2 GB while it is in use
mount -o remount,size=2G /dev/shm
This is non-disruptive. Existing files and mappings are unaffected. Shrinking below current usage fails with EBUSY.
/dev/shm and POSIX Shared Memory
/dev/shm is a tmpfs mount that provides the backing store for POSIX shared memory. When a process calls shm_open("/my_buffer", O_CREAT, 0600), the C library creates a file at /dev/shm/my_buffer. The process then calls ftruncate() to set the size and mmap() to map it into its address space.
A second process opens the same name with shm_open() and maps it. Both processes now have virtual addresses pointing to the same physical pages -- zero-copy, zero-syscall data sharing on the data path.
# Check /dev/shm usage
df -h /dev/shm
# List active shared memory segments
ls -la /dev/shm/
# Check the Docker default (64 MB)
docker inspect --format '{{.HostConfig.ShmSize}}' my_container
The 64 MB default in Docker is the single most common cause of shared memory failures in containers. PostgreSQL, Oracle, MATLAB, and MPI-based applications all require larger segments.
/run and Runtime State
systemd mounts /run as tmpfs early in boot, before the root filesystem is mounted read-write. This solves the stale PID file problem: if a daemon crashes and leaves /var/run/sshd.pid behind, the next boot would see the file and think sshd is already running. With /run on tmpfs, every boot starts clean.
# Typical /run contents
ls /run/
# Output: lock/ systemd/ user/ sshd.pid dbus/ ...
# Check /run mount and size
findmnt /run
# TARGET SOURCE FSTYPE OPTIONS
# /run tmpfs tmpfs rw,nosuid,nodev,noexec,size=1612860k,mode=755
Containers and tmpfs
Container runtimes create tmpfs mounts inside the container's mount namespace for isolation and performance:
Docker:
# Custom tmpfs at /tmp with size limit and security options
docker run --tmpfs /tmp:rw,noexec,nosuid,size=256m alpine sh
# Override /dev/shm size for database containers
docker run --shm-size=1g postgres:16
Kubernetes:
volumes:
- name: scratch
emptyDir:
medium: Memory # Creates a tmpfs-backed volume
sizeLimit: 512Mi # Enforced by kubelet eviction
Without sizeLimit, a Kubernetes emptyDir with medium: Memory defaults to 50% of node RAM. A pod writing unbounded data to this volume can trigger node-level memory pressure and affect every pod on the node.
Performance Characteristics
tmpfs eliminates the entire block I/O stack. A write() to tmpfs copies data from userspace to a page cache page. There is no block layer, no I/O scheduler, no device driver, no disk seek, no write-ahead log. The latency profile:
Operation tmpfs ext4 (SSD) ext4 (HDD)
---------- --------- ----------- -----------
4 KB write 0.5-2 us 10-50 us 2-10 ms
4 KB read 0.3-1 us 5-20 us 5-15 ms
fsync no-op 50-500 us 5-30 ms
Sequential R/W ~50 GB/s 500 MB-3 GB/s 100-200 MB/s
fsync() on tmpfs is a no-op because there is no durable storage to flush to. This makes tmpfs unsuitable for any data that must survive power loss, but ideal for scratch data where fsync overhead is pure waste.
Under the Hood
Page allocation in shmem.c. When a process writes to a tmpfs file, shmem_getpage_gfp() allocates a page. It first checks the swap cache (the page might have been swapped out earlier), then the page cache (another process might already have the page mapped), and finally allocates a fresh page. This three-level lookup is why tmpfs integrates with both the page cache and swap subsystem.
Swap interaction. tmpfs pages are added to the swap LRU lists alongside anonymous pages. Under memory pressure, kswapd treats them identically: it selects cold tmpfs pages and writes them to the swap device. This is why tmpfs data survives memory pressure (at the cost of swap I/O latency) while ramfs data cannot be evicted at all. Setting vm.swappiness affects how aggressively tmpfs pages are swapped relative to file-backed page cache pages.
Huge page support. Since kernel 4.7, tmpfs supports transparent huge pages via the huge= mount option. For a 1 GB shared memory segment, using 2 MB huge pages reduces TLB entries from 262,144 to 512, significantly reducing address translation overhead for database buffer pools and scientific computing workloads.
Accounting and limits. tmpfs tracks usage through the shmem_inode_info structure attached to each inode. The superblock tracks total blocks and inodes against the configured limits. This accounting is what makes df work on tmpfs and what enforces ENOSPC when the size limit is hit -- features that ramfs lacks entirely.
Common Questions
Can tmpfs data survive a reboot?
No. tmpfs exists in volatile memory (RAM and swap). Reboot deallocates all pages. There is no journal, no superblock on disk, no recovery mechanism. If data must survive restarts, it belongs on a persistent filesystem. For containers, this means any state written to a tmpfs-backed emptyDir is lost when the pod is rescheduled.
What happens when tmpfs runs out of space?
write() returns -1 with errno set to ENOSPC, exactly like a full disk filesystem. The process receives the same error it would get on ext4 or XFS. This is important: tmpfs space exhaustion looks identical to disk exhaustion from the application's perspective. Monitoring must check df on tmpfs mounts, not just physical disks.
Is tmpfs faster than a RAM disk (/dev/ram0)?
For most workloads, yes. A RAM disk allocates a fixed block of memory at creation (wasting RAM when underutilized) and still goes through the block I/O layer (adding overhead for request queuing and scheduling). tmpfs bypasses the block layer entirely and allocates pages on demand. The only advantage of a RAM disk is that it presents a block device, which some tools require.
How does tmpfs interact with cgroups memory limits?
tmpfs pages allocated by processes in a cgroup are charged to that cgroup's memory limit (since kernel 4.0 with memory.use_hierarchy). A container with a 1 GB memory limit that writes 800 MB to tmpfs has only 200 MB left for heap, stack, and page cache. This interaction catches operators who size container memory limits based on application RSS alone, forgetting that tmpfs writes also count against the limit.
How Technologies Use This
A microservice container runs an image processing pipeline that writes 200 MB of temporary files per request to /tmp. On the default overlay2 filesystem, each write hits the node's SSD, adding 3 ms of latency per file operation and wearing the disk with ephemeral data that is deleted seconds later. Running 40 such containers on a single node produces 8 GB/s of unnecessary disk I/O.
Docker supports the --tmpfs flag (e.g., --tmpfs /tmp:size=64m) to mount a tmpfs filesystem inside the container at /tmp. The container runtime calls mount("tmpfs", "/tmp", "tmpfs", "size=67108864") within the container's mount namespace. All writes to /tmp land in RAM-backed pages managed by the kernel page cache, isolated from other containers by namespace boundaries. Reads happen at memory speed, and the kernel reclaims every page instantly when the container stops, leaving no residual data on the node filesystem.
The size parameter acts as a hard ceiling. When a container tries to write beyond 64 MB, the write syscall returns ENOSPC rather than silently consuming node memory. Without an explicit size limit, tmpfs defaults to 50% of host RAM. On a node with 8 GB total, a single runaway container writing to an unbounded tmpfs can allocate 4 GB before the OOM killer intervenes and potentially terminates unrelated containers sharing the same node.
A Kubernetes pod running a data transformation job needs 2 GB of scratch space for intermediate Parquet files that exist for 30 seconds between pipeline stages. Writing these files to the pod's ephemeral storage on a network-attached EBS volume adds 15 ms per read and contends with other pods for disk IOPS. The job processes 500 batches per hour, and the cumulative I/O wait adds 45 minutes of overhead per day.
Setting the pod's emptyDir volume to medium: Memory tells the kubelet to mount a tmpfs filesystem into the pod. The YAML spec includes a sizeLimit field (e.g., sizeLimit: 2Gi) that the kubelet enforces through periodic polling of the mount's usage. Intermediate files written to this volume reside entirely in RAM pages, accessible at memory bandwidth (tens of GB/s) instead of network storage bandwidth (hundreds of MB/s). When the pod terminates, the kubelet unmounts the tmpfs and the kernel frees all associated pages immediately.
If a pod exceeds its sizeLimit, the kubelet evicts the pod rather than allowing it to consume unbounded node memory. This eviction behavior differs from a container hitting its memory limit: the OOM killer is not involved, and the pod transitions to a Failed state with an explicit eviction reason. Cluster operators pair the sizeLimit with resource requests on the pod to ensure the node scheduler accounts for the RAM consumed by tmpfs mounts in its capacity calculations.
A Redis instance running inside a Docker container shares monitoring data with a Prometheus sidecar container in the same pod. The sidecar reads Redis metrics 10 times per second, and both processes exchange approximately 50,000 small messages per second. Using a Unix domain socket for this communication adds 5 to 10 microseconds of latency per message due to two kernel buffer copies (userspace to kernel on write, kernel to userspace on read).
Both containers access a shared /dev/shm tmpfs mount backed by RAM. Redis writes metric snapshots to a memory-mapped file on /dev/shm using shm_open() and mmap(). The sidecar maps the same file into its address space. Because both mappings point to identical physical pages in the tmpfs filesystem, writes from Redis are visible to the sidecar without any system call on the data path and without copying data through kernel buffers. Latency for a single metric read drops to 50 to 200 nanoseconds when combined with atomic signaling for synchronization.
Docker mounts /dev/shm with a default size of 64 MB per container. If the shared memory segment needs to exceed that limit, the container must be started with --shm-size set to a higher value (e.g., --shm-size=256m), or in Kubernetes, a memory-backed emptyDir volume can be mounted at /dev/shm in the pod spec. Applications that call shm_open() and attempt to ftruncate() beyond the 64 MB default receive ENOSPC and typically crash or fall back to degraded operation without clear error messages.
Same Concept Across Tech
| Technology | How it uses tmpfs | Key gotcha |
|---|---|---|
| Docker | Mounts tmpfs at /dev/shm (64 MB default), supports --tmpfs flag for custom mounts | Default /dev/shm is too small for PostgreSQL, Oracle, or any app using large shared memory segments |
| Kubernetes | emptyDir with medium: Memory creates tmpfs-backed volumes | Without sizeLimit, a pod can consume 50% of node RAM via tmpfs and trigger node-level OOM |
| PostgreSQL | Uses /dev/shm for POSIX shared memory (dynamic_shared_memory_type = posix) | Container /dev/shm must be >= shared_buffers or the database fails to start |
| systemd | Mounts /run as tmpfs for PID files, sockets, and runtime state | Applications that write large files to /run (core dumps, journals) can exhaust the mount |
| Build tools | Bazel, Make, Cargo use tmpfs for output directories to avoid disk I/O | If the build working set exceeds tmpfs size, builds fail with ENOSPC mid-compilation |
Stack layer mapping (container tmpfs space exhaustion):
| Layer | What to check | Tool |
|---|---|---|
| Application | Which files consume the most space in /tmp? | du -sh /tmp/* inside the container |
| Container | What size= was set on the tmpfs mount? | docker inspect or findmnt inside container |
| Runtime | Is the container sharing the host tmpfs or using its own? | findmnt -t tmpfs from host and container |
| Host | How much physical RAM is available for tmpfs pages? | free -h and grep Shmem /proc/meminfo |
| Kernel | Are tmpfs pages being swapped under pressure? | vmstat 1 (check si/so columns) |
Design Rationale Unix needed a fast, standards-compliant filesystem for temporary data that does not outlive a session. Early approaches used RAM disks (fixed allocation, wasteful) or just wrote to /tmp on disk (slow, required cleanup). tmpfs solved both problems: allocate pages on demand so empty mounts cost nothing, enforce a size ceiling so runaway writes cannot consume all memory, and support swap so data survives memory pressure instead of causing OOM. The combination of demand allocation, size enforcement, and swap integration made tmpfs the universal choice for ephemeral storage across containers, init systems, and shared memory.
If You See This, Think This
| Symptom | Likely cause | First check |
|---|---|---|
| ENOSPC on /tmp but disk has space | /tmp is tmpfs and has hit its size limit | df -h /tmp (check if it is tmpfs and near 100%) |
| Container fails to start with shared memory error | /dev/shm too small (Docker default is 64 MB) | docker inspect for ShmSize or check --shm-size |
| Node OOM after pod scheduling | tmpfs emptyDir without sizeLimit consuming host RAM | kubectl describe node, check for memory pressure; findmnt -t tmpfs |
| /run fills up on long-running server | Services writing large files to /run without cleanup | du -sh /run/* to find culprits |
| Application slower than expected on tmpfs | tmpfs pages swapped to disk under memory pressure | vmstat 1 (si/so columns), free -h to check available RAM |
| Permission denied writing to tmpfs /tmp | Mount options include noexec or wrong mode | mount |
When to Use / Avoid
Relevant when:
- Providing fast, ephemeral scratch space for containers, build systems, or tests
- Setting up POSIX shared memory between processes (/dev/shm)
- Storing runtime state that must not survive reboot (/run, PID files, socket files)
- Eliminating disk I/O for intermediate build artifacts or test data
Watch out for:
- tmpfs without size= defaults to 50% of RAM. On memory-constrained hosts this is dangerous
- Docker /dev/shm defaults to 64 MB. Database containers and scientific workloads need more
- ramfs has no size limit and no swap. Never use it for user-writable data in production
- tmpfs data disappears on reboot. Mission-critical data belongs on persistent storage
Try It Yourself
1 # Create a 512 MB tmpfs mount with /tmp-like permissions
2
3 sudo mount -t tmpfs -o size=512m,mode=1777 tmpfs /mnt/fast
4
5 # List all tmpfs mounts with options and sizes
6
7 findmnt -t tmpfs -o TARGET,SOURCE,FSTYPE,SIZE,OPTIONS
8
9 # Check current tmpfs usage across all mounts
10
11 df -h -t tmpfs
12
13 # Resize /dev/shm without unmounting (live remount)
14
15 sudo mount -o remount,size=2G /dev/shm
16
17 # Check how much memory tmpfs is consuming system-wide
18
19 grep Shmem /proc/meminfo
20
21 # Run a Docker container with 256 MB tmpfs at /tmp
22
23 docker run --tmpfs /tmp:rw,noexec,nosuid,size=256m alpine df -h /tmp
24
25 # Verify the filesystem type of /tmp (0x01021994 = TMPFS_MAGIC)
26
27 stat -f -c "%T" /tmp
28
29 # Create a POSIX shared memory segment from the command line
30
31 dd if=/dev/zero of=/dev/shm/test_segment bs=1M count=10 && ls -lh /dev/shm/test_segment
32
33 # Check Docker default shm-size for a running container
34
35 docker inspect --format '{{.HostConfig.ShmSize}}' <container_id>Debug Checklist
- 1
Check all tmpfs mounts and their sizes: findmnt -t tmpfs -o TARGET,SOURCE,SIZE,OPTIONS - 2
Check tmpfs usage: df -h /dev/shm /run /tmp - 3
Check if /tmp is tmpfs or disk-backed: stat -f /tmp (type 0x01021994 = tmpfs) - 4
List POSIX shared memory segments: ls -la /dev/shm/ - 5
Check container /dev/shm size: docker inspect <container> | grep ShmSize - 6
Check swap usage by tmpfs: swapon --show && grep Shmem /proc/meminfo - 7
Verify mount options (noexec, nosuid, size): mount | grep tmpfs
Key Takeaways
- ✓tmpfs is not a RAM disk. A RAM disk (like /dev/ram0) allocates a fixed block of memory at creation. tmpfs allocates pages on demand and frees them when files are deleted. An empty tmpfs uses zero RAM. A 10 GB tmpfs mount with only 50 MB of files in it uses only 50 MB of physical memory.
- ✓tmpfs pages can be swapped out. Under memory pressure, the kernel treats tmpfs pages like any other anonymous page and moves them to swap. This means tmpfs data survives memory pressure (it just gets slower), while ramfs data can never be evicted and will cause OOM conditions instead.
- ✓The size= mount option limits total file data, not resident memory. Setting size=1G means up to 1 GB of file content can exist in the filesystem. If memory is tight, some of those pages live in swap. If nr_inodes= is not set, the default allows approximately 50% of RAM worth of inodes.
- ✓Container runtimes mount tmpfs for /tmp, /run, and /dev/shm inside each container. These are independent tmpfs instances in the container's mount namespace. The size= parameter on each mount is critical -- without it, a single container can consume half of host RAM through tmpfs writes alone.
- ✓tmpfs supports huge pages via the huge= mount option. Setting huge=within_size or huge=always allows tmpfs to use 2 MB huge pages for large files, reducing TLB pressure during sequential access. This matters for shared memory segments used by databases.
Common Pitfalls
- ✗Assuming tmpfs data survives a reboot. tmpfs lives in volatile memory (RAM plus swap). Power loss or reboot destroys everything. Data that must survive restarts belongs on a persistent filesystem. This bites container deployments where tmpfs-backed volumes silently lose state during pod rescheduling.
- ✗Using ramfs in production instead of tmpfs. ramfs has no size limit. A process that writes continuously to a ramfs mount will consume all system memory because ramfs pages cannot be evicted or reclaimed. The OOM killer is the only backstop. tmpfs with an explicit size= limit prevents this.
- ✗Not setting --shm-size in Docker or a memory-backed emptyDir in Kubernetes for applications that use POSIX shared memory. The default /dev/shm in Docker is 64 MB. PostgreSQL with shared_buffers=256MB will fail to start. Oracle databases, MATLAB, and many MPI-based scientific tools also require larger /dev/shm.
- ✗Confusing tmpfs size with memory reservation. A tmpfs mounted with size=4G does not reserve 4 GB of RAM. It sets a ceiling. Actual memory use depends on files written. But if processes fill it to 4 GB and memory is scarce, those pages compete with application memory for physical frames, and the result is swap thrashing or OOM kills.
- ✗Mounting tmpfs without noexec,nosuid when used for temporary data. On security-sensitive systems, tmpfs mounts at /tmp should include noexec and nosuid options to prevent execution of uploaded binaries and privilege escalation through setuid files staged in /tmp.
Reference
In One Line
tmpfs turns RAM into a filesystem with a size limit, swap support, and instant cleanup on delete -- the reason /dev/shm, /run, and every container /tmp mount exists.