In-Memory File System
Composite pattern for hierarchical file/directory structure with path resolution. Files and directories share a common interface, so recursive operations like size calculation and deletion work uniformly across the tree.
Key Abstractions
Facade with root directory and path resolution. All operations go through here.
Abstract base for File and Directory. Enables uniform treatment across the tree.
Leaf node holding a name and string content
Composite node with a children map for O(1) child lookup by name
Class Diagram
The Key Insight
A file system is a tree. That sounds obvious, but the design implications run deep. You have two fundamentally different types of nodes: files that hold data and directories that hold other nodes. The naive approach treats them as completely separate types and checks which one you have before every operation. This works until you need to compute the total size of a directory, recursively delete a subtree, or search for files matching a pattern. Suddenly every operation needs its own tree-walking code with type checks at every level.
Composite pattern eliminates this entirely. File and Directory both extend a common FileSystemEntry. When you call size() on a directory, it sums up the sizes of its children. Some children are files that return their content length. Others are directories that recurse further. The caller never needs to know the difference. One method call, uniform behavior, regardless of tree depth.
The other critical decision is how directories store their children. A list forces you to scan every entry when looking up a name. That is O(n) per lookup and it compounds fast in deep paths. A HashMap gives O(1) lookups by name, which means path resolution is O(depth) rather than O(depth * breadth).
Requirements
Functional
- Create directories (mkdir) and files (touch) at any valid path
- Write content to files and read it back (write, cat)
- List directory contents (ls) sorted alphabetically
- Remove files and directories (rm), with directory removal being recursive
- Compute size of any entry: files return content length, directories return recursive total
- Search for entries by name across the tree (find)
Non-Functional
- O(1) child lookup within a directory using HashMap
- Path resolution fails fast with clear error messages for missing segments
- Adding new entry types (symlinks, mount points) should not require modifying existing operations
Design Decisions
Why Composite pattern instead of a type flag?
A type flag means every operation has if entry.type == FILE ... else if entry.type == DIRECTORY ... branching. That works for two types but breaks open/closed principle hard. Want to add symbolic links? Touch every method. Composite uses polymorphism. Each subclass implements its own behavior. Directory's size() recurses. File's size() returns content length. Client code calls entry.size() and the runtime dispatches correctly.
Why HashMap for children instead of a List?
Looking up a child by name in a list means scanning every entry. A directory with 10 files? No problem. A directory with 10,000 log files? Every ls, cat, and mkdir operation inside that directory takes linear time. HashMap gives O(1) lookups by name. The sorted output for ls is a one-time sort on the keys, and that only happens when someone actually asks for a listing.
Why split path resolution into resolve() and resolveParent()?
Most operations need either the entry itself (cat, ls, size) or the parent directory plus the child name (mkdir, touch, rm). Having both methods avoids duplicating the tree-walking logic. resolveParent() is the workhorse for mutations because you need the parent to add or remove a child. resolve() handles reads. Clean separation.
Why a facade instead of exposing Entry directly?
Without a facade, callers need to understand path parsing, tree traversal, and entry types. They would manually split paths, walk directories, cast entries to the right type. The FileSystem facade absorbs all that complexity. Callers pass a path string and get a result. Path validation, traversal, and error handling happen in one place.
Interview Follow-ups
- "How would you add symbolic links?" Create a SymLink class extending FileSystemEntry that stores a target path string. On any operation, resolve the symlink by following the target path. Keep a visited set during resolution to detect circular references.
- "How would you support file permissions?" Add a Permissions object to FileSystemEntry with read, write, and execute flags plus an owner identifier. Check permissions at the FileSystem facade layer before every operation.
- "What about undo support for delete?" Command pattern. Wrap each rm operation in a command object that captures the deleted subtree. Push commands onto a stack. Undo pops and re-attaches the subtree to its former parent.
- "How would you handle concurrent access safely?" Add a ReadWriteLock at the FileSystem level. Read operations (cat, ls, size, find) take a read lock. Write operations (mkdir, touch, write, rm) take a write lock. This allows concurrent reads while serializing writes.
45-Minute LLD Playbook
Each phase is what the interviewer expects you to do and say. Concrete steps, not topic hints. Diagrams are what you would sketch on the board.
- 15 min
Clarify the API surface and lock the children-storage choice
GoalPin down the seven operations (mkdir, touch, write, cat, ls, rm, size) plus find, and commit to HashMap-keyed children for O(1) name lookup. Then ask the interviewer how they want to see the design so you can budget the remaining 40 minutes.
Do & Say- ASK·1Open with the API: mkdir, touch, write, cat, ls, rm, size, find. ls returns sorted names. rm on a directory is recursive, no orphans. size on a directory recurses, on a file returns content length.
- SAY·2Lock the children storage: Directory.children is a HashMap keyed by name. O(1) lookup per path segment means a path of depth k resolves in O(k). A list would be O(name count) per segment, which compounds painfully on deep trees with wide directories.
- SAY·3Lock the Composite choice: File and Directory both extend FileSystemEntry abstract. Every operation on the tree (size, rm, find) uses polymorphism, never isinstance branches. Adding SymLink later means one new class, zero edits.
- SAY·4Park scope: Symlinks, permissions, concurrent access, and undo are extensions. v1 is the seven ops plus find on a single-threaded tree.
- ASK·5Ask the process question before drawing: Want me to draw the class hierarchy with Composite labeled and a sample tree on the board, or jump to code where resolve walks the tree? I will hit Composite and Iterator either way.
Interviewer is grading: HashMap-keyed children with the O(1) reasoning. Composite committed to with no isinstance branches. You ask the diagram-vs-code question unprompted.
- 25-10 min
Sketch the FileSystemEntry hierarchy and the path resolver split
GoalLay out the three classes (FileSystemEntry abstract, File leaf, Directory composite) and the two resolver methods (resolve for reads, resolve_parent for writes). Lock the path-splitting rule before any code.
Do & Say- DRAW·1Draw the hierarchy: FileSystemEntry abstract with name, created_at, is_directory, size. File leaf with content. Directory composite with children dict, add_child, remove_child, get_child, list_children sorted.
- SAY·2Tag the patterns: Composite at FileSystemEntry as the shared interface. Iterator on find that walks the tree recursively.
- SAY·3Lock the path-splitting rule: Strip leading and trailing slash, split on slash, filter empty segments. Both /home/alice/ and home/alice and /home/alice resolve to the same three-segment path. Empty path means root.
- WRITE·4Write the resolver split: resolve(path) walks the full path and returns the entry at the end. resolve_parent(path) walks all but the last segment and returns (parent_directory, last_name). resolve is for reads (cat, ls, size), resolve_parent is for writes (mkdir, touch, rm) because writes need to mutate the parent.
- SAY·5Verbalize the recursion in size: Directory.size sums child.size for each child. File.size returns content length. Polymorphic dispatch. The caller never asks what type each child is.
- SAY·6Verbalize the recursion in find: find_recursive checks the entry name against target, appends to results on match. Then if directory, recurses into each child with the extended path. Iterator pattern on the tree.
- SAY·7If a diagram was requested, draw it now with the sample tree (home, alice, docs, notes.txt). Otherwise verbalize and move on.
Interviewer is grading: resolve and resolve_parent split is named with the read-versus-write reasoning. Polymorphic size is committed to with no type checks. Path splitting is canonicalized at the boundary.
- 325 min
Code bottom-up matching the tree structure
GoalType the code in dependency order so FileSystemEntry exists before File and Directory, and both exist before FileSystem. Talk through every polymorphic call and every error case as you write.
Do & Say- SAY·1Start with FileSystemEntry abstract. Constructor takes name, sets _name and _created_at. name property. Abstract is_directory and size. Say: Composite root. Every subclass implements is_directory and size. Caller never branches. (~2 min)
- SAY·2Code File extending FileSystemEntry. Constructor takes name and optional content. is_directory returns False. size returns len(content). content property with setter. Say: Leaf. size is content length. (~2 min)
- SAY·3Code Directory basics. Constructor takes name. _children is a dict. is_directory returns True. add_child raises FileExistsError on name collision, else inserts. remove_child raises FileNotFoundError if missing, else pops and returns. (~2 min)
- SAY·4Code Directory reads and size. get_child returns from dict or None. list_children returns sorted keys. get_all_children returns a dict copy. size recurses summing child.size. (~2 min)
- SAY·5Say while coding Directory: Composite. Recursion is one line because polymorphism does the dispatch. (~1 min)
- SAY·6Code FileSystem init and _split_path. Root is Directory(slash). _split_path strips and filters segments. (~1 min)
- SAY·7Code _resolve. Handles the slash case explicitly, else walks segments. If current is not a directory mid-walk, raises FileNotFoundError. If get_child returns None, raises. (~2 min)
- SAY·8Code _resolve_parent and explain. Walks parts[:-1] and returns (parent_dir, last_name). Both raise on bad paths with clear messages. Say: resolve_parent for writes because mutations need the parent. resolve for reads. (~2 min)
- SAY·9Code mkdir(path): resolve_parent, add_child(Directory(name)). touch(path): resolve_parent, add_child(File(name)) if not exists. write(path, content): resolve, raise if directory, set content. cat(path): resolve, raise if directory, return content. Say: Every mutation goes through resolve_parent. Every read through resolve. Path validation is single-sourced. (~4 min)
- SAY·10Code ls(path=slash): resolve, if directory return list_children, else return [entry.name]. rm(path): resolve_parent, remove_child(name). rm on a directory drops its entire subtree, which is automatic because Python garbage collects unreachable objects. Say: rm is recursive by gc. We unlink one pointer, the subtree falls off the tree. (~3 min)
- SAY·11Code size(path=slash): resolve and call .size(). One line, but the recursion happens inside Directory.size. Say: This is the Composite payoff. Three characters of code do the entire subtree sum. (~1 min)
- SAY·12Code find(path, name) that calls _find_recursive(entry, current_path, target, results). The helper checks entry.name == target then if directory iterates children with extended paths and recurses. Say: Iterator on the tree. Returns paths, not entries, so the caller can resolve later if needed. (~3 min)
- SAY·13Walk through the build steps. mkdir /home, /home/alice, /home/alice/docs, /home/alice/projects, /home/bob. touch /home/alice/docs/notes.txt then write 28 chars. (~0.5 min)
- SAY·14Walk through reads and rm. ls /home returns [alice, bob]. cat notes.txt returns 28 chars. size /home/alice/docs sums to the total content length. find returns matching paths. rm /home/alice/docs drops the whole subtree. (~0.5 min)
Interviewer is grading: Directory.size is one recursive sum, no isinstance. resolve_parent is the single mutation entry point. rm on a directory drops the subtree without explicit recursion because gc handles it. find returns paths not entries.
- 45 min
Trade-offs, extensions, and wrap-up
GoalName two trade-offs you defend, volunteer two extensions you would build next, close with one sentence summarizing the design.
Do & Say- SAY·1Trade-off one, HashMap children over a list: A list keeps insertion order for free and uses less memory per entry. But ls, cat, and mkdir all do name lookups. On a 10,000-file directory the list scan is O(n) per call.
- SAY·2Defend HashMap: HashMap is O(1) per lookup. The trade is one hash per lookup, negligible for short names. ls sorts the keys on demand.
- SAY·3Trade-off two, Composite over a tagged union: A type flag plus if-else works for two types. Adding SymLink, MountPoint, or DeviceFile means touching every operation. Composite uses polymorphism, each new entry type is one new class implementing is_directory and size. The cost is one abstract method per shared op.
- SAY·4Extension one, symbolic links: SymLink extends FileSystemEntry with a target path string. resolve follows the link by re-entering the resolver on the target. A visited set during resolution detects circular symlinks and raises.
- SAY·5Extension two, ReadWriteLock for concurrency: cat, ls, size, find take a shared read lock. mkdir, touch, write, rm take an exclusive write lock. Many readers, one writer. The lock lives on FileSystem, granularity is the whole tree. Per-directory locks are a v3 if contention becomes a problem.
- SAY·6Close with one sentence: FileSystemEntry abstract gives File and Directory a shared interface. Directory composes children via HashMap for O(1) lookup. resolve and resolve_parent split path walking by read versus write. find is the tree iterator. Composite makes the recursion one line each.
Interviewer is grading: You defend HashMap over list with the O(1) reasoning and the on-demand sort for ls. You volunteer symlinks with circular-detection and concurrency with the lock split unprompted. You close in one breath naming the patterns.
Code Implementation
from abc import ABC, abstractmethod
from datetime import datetime
from typing import Optional
class FileSystemEntry(ABC):
"""Abstract base for files and directories. Composite pattern root.
Both leaf (File) and composite (Directory) extend this."""
def __init__(self, name: str):
self._name = name
self._created_at = datetime.now()
@property
def name(self) -> str:
return self._name
@abstractmethod
def is_directory(self) -> bool: ...
@abstractmethod
def size(self) -> int:
"""Files return content length. Directories return recursive total."""
...
class File(FileSystemEntry):
"""Leaf node. Holds a name and string content."""
def __init__(self, name: str, content: str = ""):
super().__init__(name)
self._content = content
def is_directory(self) -> bool:
return False
@property
def content(self) -> str:
return self._content
@content.setter
def content(self, value: str) -> None:
self._content = value
def size(self) -> int:
return len(self._content)
class Directory(FileSystemEntry):
"""Composite node. Children stored in a dict for O(1) lookup by name."""
def __init__(self, name: str):
super().__init__(name)
self._children: dict[str, FileSystemEntry] = {}
def is_directory(self) -> bool:
return True
def add_child(self, entry: FileSystemEntry) -> None:
if entry.name in self._children:
raise FileExistsError(f"'{entry.name}' already exists in '{self._name}'")
self._children[entry.name] = entry
def remove_child(self, name: str) -> FileSystemEntry:
if name not in self._children:
raise FileNotFoundError(f"'{name}' not found in '{self._name}'")
return self._children.pop(name)
def get_child(self, name: str) -> Optional[FileSystemEntry]:
return self._children.get(name)
def list_children(self) -> list[str]:
return sorted(self._children.keys())
def get_all_children(self) -> dict[str, "FileSystemEntry"]:
return dict(self._children)
def size(self) -> int:
"""Recursive size. Composite pattern makes this clean."""
return sum(child.size() for child in self._children.values())
class FileSystem:
"""Facade for all file system operations. Parses paths,
walks the tree, delegates to the right entry."""
def __init__(self):
self._root = Directory("/")
def _split_path(self, path: str) -> list[str]:
return [p for p in path.strip("/").split("/") if p]
def _resolve(self, path: str) -> FileSystemEntry:
"""Walks the full path and returns the entry at the end."""
if path == "/":
return self._root
parts = self._split_path(path)
current: FileSystemEntry = self._root
for part in parts:
if not current.is_directory():
raise FileNotFoundError(f"'{current.name}' is not a directory")
child = current.get_child(part)
if child is None:
raise FileNotFoundError(f"'{part}' not found")
current = child
return current
def _resolve_parent(self, path: str) -> tuple[Directory, str]:
"""Returns the parent directory and the final name component."""
parts = self._split_path(path)
if not parts:
raise ValueError("Cannot operate on root directly")
current = self._root
for part in parts[:-1]:
child = current.get_child(part)
if child is None or not child.is_directory():
raise FileNotFoundError(f"Directory '{part}' not found in path '{path}'")
current = child
return current, parts[-1]
def mkdir(self, path: str) -> None:
parent, name = self._resolve_parent(path)
parent.add_child(Directory(name))
def touch(self, path: str) -> None:
parent, name = self._resolve_parent(path)
if parent.get_child(name) is None:
parent.add_child(File(name))
def write(self, path: str, content: str) -> None:
entry = self._resolve(path)
if entry.is_directory():
raise IsADirectoryError(f"'{path}' is a directory")
entry.content = content
def cat(self, path: str) -> str:
entry = self._resolve(path)
if entry.is_directory():
raise IsADirectoryError(f"'{path}' is a directory")
return entry.content
def ls(self, path: str = "/") -> list[str]:
entry = self._resolve(path)
if entry.is_directory():
return entry.list_children()
return [entry.name]
def rm(self, path: str) -> None:
parent, name = self._resolve_parent(path)
parent.remove_child(name)
def size(self, path: str = "/") -> int:
return self._resolve(path).size()
def find(self, path: str, name: str) -> list[str]:
"""Recursive search for entries matching a name. Iterator pattern
over the tree structure."""
results: list[str] = []
entry = self._resolve(path)
self._find_recursive(entry, path.rstrip("/"), name, results)
return results
def _find_recursive(self, entry: FileSystemEntry, current_path: str,
target: str, results: list[str]) -> None:
if entry.name == target:
results.append(current_path)
if entry.is_directory():
for child_name, child in entry.get_all_children().items():
child_path = f"{current_path}/{child_name}"
self._find_recursive(child, child_path, target, results)
if __name__ == "__main__":
fs = FileSystem()
# Build directory structure
fs.mkdir("/home")
fs.mkdir("/home/alice")
fs.mkdir("/home/alice/docs")
fs.mkdir("/home/alice/projects")
fs.mkdir("/home/bob")
# Create and write files
fs.touch("/home/alice/docs/notes.txt")
fs.write("/home/alice/docs/notes.txt", "Meeting at 3pm. Bring laptop.")
fs.touch("/home/alice/docs/todo.txt")
fs.write("/home/alice/docs/todo.txt", "Buy groceries\nFinish report")
fs.touch("/home/alice/projects/readme.txt")
fs.write("/home/alice/projects/readme.txt", "Project Alpha")
fs.touch("/home/bob/readme.txt")
fs.write("/home/bob/readme.txt", "Hello from Bob")
# List directories
print("ls /:", fs.ls("/"))
print("ls /home:", fs.ls("/home"))
print("ls /home/alice:", fs.ls("/home/alice"))
print("ls /home/alice/docs:", fs.ls("/home/alice/docs"))
# Read files
print("\ncat /home/alice/docs/notes.txt:")
print(f" {fs.cat('/home/alice/docs/notes.txt')}")
print("cat /home/bob/readme.txt:")
print(f" {fs.cat('/home/bob/readme.txt')}")
# Check sizes
print(f"\nSize of /home/alice/docs: {fs.size('/home/alice/docs')} chars")
print(f"Size of entire fs: {fs.size('/')} chars")
# Find files by name
print(f"\nfind / readme.txt: {fs.find('/', 'readme.txt')}")
# Remove a file and verify
fs.rm("/home/bob/readme.txt")
print(f"\nAfter rm /home/bob/readme.txt, ls /home/bob: {fs.ls('/home/bob')}")
# Remove a directory with children
fs.rm("/home/alice/docs")
print(f"After rm /home/alice/docs, ls /home/alice: {fs.ls('/home/alice')}")
# Verify error handling
try:
fs.cat("/nonexistent")
except FileNotFoundError as e:
print(f"\nExpected error: {e}")
try:
fs.write("/home", "data")
except IsADirectoryError as e:
print(f"Expected error: {e}")
assert fs.ls("/home/bob") == []
assert fs.ls("/home/alice") == ["projects"]
assert fs.size("/") > 0
print("\nAll assertions passed.")Interview Grading by Level
What an interviewer at each level expects to see in your answer. Use this to calibrate, not to perform.
Junior Engineer (L3)
Builds File and Directory as separate classes and implements the seven ops, but uses isinstance checks everywhere and stores children in a list.
- Names Composite and Iterator correctly.
- Implements File with content and Directory with children.
- Writes mkdir and touch that build entries at a path.
- Recognizes that ls needs to handle both files and directories.
- Implements basic path splitting on slash.
- Stores children in a list, making lookups O(n) per segment.
- Uses isinstance checks throughout instead of polymorphism on is_directory.
- Forgets that rm on a directory should drop the subtree.
- Does not split resolve from resolve_parent, duplicating tree-walking logic.
- Misses path canonicalization at the boundary (does not handle trailing slash).
Mid-Level Engineer (L4)
Drives the design end-to-end with HashMap-backed children, polymorphic size and is_directory, and the resolve versus resolve_parent split.
- Implements Directory.children as a dict keyed by name.
- Writes Directory.size as one recursive sum with no type checks.
- Splits path resolution into resolve (reads) and resolve_parent (writes).
- Canonicalizes paths by stripping and splitting at the boundary.
- Implements find as a tree iterator that returns paths.
- Raises clear errors (FileExistsError, FileNotFoundError, IsADirectoryError) at the right boundaries.
- Does not volunteer symlinks or concurrent access unless prompted.
- Treats path validation as exception-driven rather than a single explicit checker.
- Misses that ls should return sorted names rather than insertion order.
Senior Engineer (L5+)
Volunteers symlinks with circular detection and ReadWriteLock concurrency before being asked, names the polymorphism-over-isinstance invariant, and frames each pattern as a specific extensibility win.
- Volunteers symbolic links with target-path resolution and a visited-set cycle check without prompting.
- Proposes ReadWriteLock concurrency with the shared-readers exclusive-writer model.
- Frames each pattern around the failure it prevents: Composite for isinstance sprawl across every op, Iterator for find duplication across grep, du, count.
- Names the on-demand sort in ls as the trade for HashMap children.
- Proposes Command-pattern undo with deleted-subtree capture for restore.
- Discusses permissions as a Permissions object on FileSystemEntry plus a check at the facade layer before every op.
- Closes with a one-sentence summary that names both patterns and the polymorphism invariant in under 20 seconds.
Common Mistakes
- ✗Using isinstance/instanceof checks everywhere instead of polymorphism. Adding a new entry type means touching every operation.
- ✗Storing children in a list. Linear scan for every lookup is fine for 10 files, painful for 10,000.
- ✗Forgetting that rm on a directory should be recursive. Leaving orphaned children is a silent resource leak.
- ✗Not validating paths. Operations on '/foo/bar' should fail cleanly when '/foo' does not exist.
Key Points
- ✓Composite pattern lets files and directories share a common interface. Recursive operations like size() work without type-checking.
- ✓Children stored as HashMap, not a list. Name-based lookups are O(1) instead of scanning every entry.
- ✓Path resolution walks the tree from root by splitting on '/'. Missing segments fail fast with clear errors.
- ✓FileSystemEntry is abstract with polymorphic behavior. No instanceof calls in client code.