Platform ScaleProblem 12 of 29

Platform ScaleAdvanced

File Storage (Dropbox)

Strategy PatternMemento PatternComposite Pattern

Chunked uploads, content-addressed storage, versioning. The win is deduplication — two users uploading the same file only store the chunks once.

Key Abstractions

File

User-facing entity — owner, path, versions, current head

Version

A snapshot of the file — list of chunk hashes, timestamp, size

Chunk

Fixed-size blob (4MB typical). Content-addressed by SHA-256 of its bytes.

BlobStore

Low-level store keyed by chunk hash. Typically S3 or equivalent.

MetadataStore

Tree of folders, files, and versions. The source of truth for listings.

Uploader

Orchestrates chunking, dedup, upload, download; also exposes delete/undelete/purge for the trash flow

Class Diagram

The Key Insight

Dropbox scales because it doesn't store files — it stores chunks keyed by their content hash. Two users uploading the same PDF only store the PDF's chunks once. The same file saved twice adds zero bytes to cold storage. This single design choice is worth billions in infrastructure cost.

Separation of metadata and blob stores is the second lever. Metadata — who owns what, path trees, version history — is small, mutates often, and needs transactions. Blobs are large, immutable, and eventually consistent. Treating them as one database is a recipe for suffering. A Postgres-for-metadata / S3-for-blobs split is the industry default.

The third insight is immutable versions. A version is a list of chunk hashes, not a mutation of prior state. Restoring v5 after editing to v7 creates v8 that shares chunks with v5. No data is ever overwritten, which means corruption, ransomware, and "I deleted the wrong file" all become recoverable.

Requirements

Functional

Upload a file, chunk it, store only unique chunks
Download the head version or any earlier version
Restore to an earlier version (creates a new head pointing to old chunks)
Resume a partial upload from the last acknowledged chunk
Multiple users can store the same content with zero duplication

Non-Functional

Storage cost scales with unique content, not total uploads
Download parallelizable (fetch chunks concurrently)
Metadata updates are transactional; blob writes are eventually consistent
Bit-exact recovery for any stored version

Design Decisions

Why content-addressed storage?

Deduplication is the economic moat. With path-keyed storage, the same bytes uploaded by 100k users cost 100k × filesize. Content-addressed, it costs filesize × 1. At petabyte scale this is the difference between a profitable product and bankruptcy.

Why 4MB chunks?

Small chunks give better dedup (a single-byte edit only rewrites one chunk). Large chunks amortize per-chunk overhead (HTTP round-trip, hash computation, index entry). 4MB is Dropbox's observed sweet spot; Git uses variable-length chunks via rolling hash for even better dedup.

Why SHA-256 and not MD5?

MD5 collisions are trivial to generate. If two files share an MD5, one silently overwrites the other — catastrophic. SHA-256 collisions have never been demonstrated. The extra CPU cost is negligible next to the network I/O.

Why immutable versions?

Mutating a file in place means the old bytes are gone the instant the write completes. Users delete the wrong thing, malware encrypts everything, a bug corrupts data — and without versioning, there's no "restore to yesterday." Immutability makes recovery a metadata operation.

Why resumable uploads?

A 10GB file over a flaky mobile connection fails halfway every time. Chunked uploads with per-chunk ACK let the client resume from the last confirmed chunk. Dropbox's success on mobile hinges on this.

Interview Follow-ups

"How would you share a file?" Add a Share entity: (fileId, sharedWith, permission). When Bob accesses a file shared by Alice, the system looks up the share, grants read-only access to the same chunks. No data copy.
"What about conflict resolution when two clients edit offline?" Three options: last-writer-wins (simple, loses data), conflict copy (Alice's file becomes demo (conflict - Alice).txt), or CRDT merge (complex, only works for structured formats).
"How would real-time sync work?" A long-poll or WebSocket per client listening for metadata changes on watched paths. Server pushes deltas; client pulls only the new chunks.
"How do you handle massive files (1TB+)?" Multipart upload with parallel chunk uploads. Metadata entry created upfront; chunks trickle in. Finalize builds the version record once all chunks land.
"How does end-to-end encryption fit?" Client-side AES-GCM encryption before hashing. Dedup breaks (ciphertext differs per encryption) unless convergent encryption is used — key derived from content hash.

Code Implementation

  1  from __future__ import annotations
  2  from abc import ABC, abstractmethod
  3  from dataclasses import dataclass, field
  4  from datetime import datetime
  5  from threading import RLock
  6  import hashlib
  7  import uuid
  8  
  9  
 10  CHUNK_SIZE = 4 * 1024 * 1024   # 4 MB — matches Dropbox's real-world size
 11  
 12  
 13  @dataclass(frozen=True)
 14  class Chunk:
 15      hash: str
 16      size: int
 17      data: bytes
 18  
 19  
 20  def chunk_hash(data: bytes) -> str:
 21      return hashlib.sha256(data).hexdigest()
 22  
 23  
 24  class BlobStore(ABC):
 25      """Low-level content-addressed storage."""
 26  
 27      @abstractmethod
 28      def put(self, hash_: str, data: bytes) -> None: ...
 29  
 30      @abstractmethod
 31      def get(self, hash_: str) -> bytes: ...
 32  
 33      @abstractmethod
 34      def exists(self, hash_: str) -> bool: ...
 35  
 36  
 37  class InMemoryBlobStore(BlobStore):
 38      def __init__(self):
 39          self._blobs: dict[str, bytes] = {}
 40          self._lock = RLock()
 41          self.put_calls = 0      # for dedup demonstration
 42  
 43      def put(self, hash_: str, data: bytes) -> None:
 44          with self._lock:
 45              self.put_calls += 1
 46              if hash_ not in self._blobs:
 47                  self._blobs[hash_] = data
 48  
 49      def get(self, hash_: str) -> bytes:
 50          with self._lock:
 51              if hash_ not in self._blobs:
 52                  raise KeyError(f"chunk {hash_[:8]} missing")
 53              return self._blobs[hash_]
 54  
 55      def exists(self, hash_: str) -> bool:
 56          with self._lock:
 57              return hash_ in self._blobs
 58  
 59      def unique_blob_count(self) -> int:
 60          with self._lock:
 61              return len(self._blobs)
 62  
 63  
 64  @dataclass
 65  class Version:
 66      id: str
 67      size: int
 68      created_at: datetime
 69      chunk_hashes: list[str]
 70      author_id: str
 71  
 72  
 73  @dataclass
 74  class File:
 75      id: str
 76      owner: str
 77      path: str
 78      versions: list[Version] = field(default_factory=list)
 79      head_version_id: str = ""
 80      deleted: bool = False                       # soft-delete (trash)
 81      deleted_at: datetime | None = None
 82  
 83  
 84  class MetadataStore:
 85      def __init__(self):
 86          self._by_path: dict[str, File] = {}    # (owner, path) -> File — simplified to path
 87          self._by_id: dict[str, File] = {}
 88          self._lock = RLock()
 89  
 90      def get_or_create(self, owner: str, path: str) -> File:
 91          with self._lock:
 92              key = f"{owner}:{path}"
 93              existing = self._by_path.get(key)
 94              if existing is not None and not existing.deleted:
 95                  return existing
 96              if existing is not None and existing.deleted:
 97                  # Re-uploading to a soft-deleted path resurrects it with history intact.
 98                  existing.deleted = False
 99                  existing.deleted_at = None
100                  return existing
101              f = File(id=str(uuid.uuid4())[:8], owner=owner, path=path)
102              self._by_path[key] = f
103              self._by_id[f.id] = f
104              return f
105  
106      def add_version(self, file_id: str, version: Version) -> None:
107          with self._lock:
108              f = self._by_id[file_id]
109              f.versions.append(version)
110              f.head_version_id = version.id
111  
112      def get_file(self, file_id: str) -> File:
113          with self._lock:
114              return self._by_id[file_id]
115  
116      def list(self, owner: str, include_trash: bool = False) -> list[File]:
117          with self._lock:
118              return [
119                  f for k, f in self._by_path.items()
120                  if k.startswith(f"{owner}:") and (include_trash or not f.deleted)
121              ]
122  
123      def trash(self, owner: str) -> list[File]:
124          with self._lock:
125              return [f for k, f in self._by_path.items() if k.startswith(f"{owner}:") and f.deleted]
126  
127      def mark_deleted(self, file_id: str) -> None:
128          with self._lock:
129              f = self._by_id[file_id]
130              if f.deleted:
131                  return
132              f.deleted = True
133              f.deleted_at = datetime.utcnow()
134  
135      def restore(self, file_id: str) -> None:
136          with self._lock:
137              f = self._by_id[file_id]
138              f.deleted = False
139              f.deleted_at = None
140  
141      def purge(self, file_id: str) -> list[str]:
142          """Permanently remove metadata. Returns chunk hashes that *might* be orphaned."""
143          with self._lock:
144              f = self._by_id.pop(file_id, None)
145              if f is None:
146                  return []
147              self._by_path.pop(f"{f.owner}:{f.path}", None)
148              return [h for v in f.versions for h in v.chunk_hashes]
149  
150  
151  class Uploader:
152      def __init__(self, blob: BlobStore, meta: MetadataStore, chunk_size: int = CHUNK_SIZE):
153          self._blob = blob
154          self._meta = meta
155          self._chunk_size = chunk_size
156  
157      def upload(self, owner: str, path: str, content: bytes) -> Version:
158          file = self._meta.get_or_create(owner, path)
159          chunks = self._split(content)
160          hashes = []
161          for data in chunks:
162              h = chunk_hash(data)
163              hashes.append(h)
164              # Skip the upload if this chunk already exists — dedup across users and versions.
165              if not self._blob.exists(h):
166                  self._blob.put(h, data)
167          version = Version(
168              id=str(uuid.uuid4())[:8],
169              size=len(content),
170              created_at=datetime.utcnow(),
171              chunk_hashes=hashes,
172              author_id=owner,
173          )
174          self._meta.add_version(file.id, version)
175          return version
176  
177      def download(self, file_id: str, version_id: str | None = None) -> bytes:
178          file = self._meta.get_file(file_id)
179          if file.deleted and version_id is None:
180              raise LookupError("file is in trash — restore it first, or request a version explicitly")
181          target = next(
182              (v for v in file.versions if version_id is None and v.id == file.head_version_id
183                                           or v.id == version_id),
184              None,
185          )
186          if target is None:
187              raise LookupError("version not found")
188          return b"".join(self._blob.get(h) for h in target.chunk_hashes)
189  
190      def delete_file(self, file_id: str) -> None:
191          """Soft delete — file moves to trash but chunks stay put. Reversible."""
192          self._meta.mark_deleted(file_id)
193  
194      def undelete_file(self, file_id: str) -> None:
195          """Restore a soft-deleted file. Head version is preserved."""
196          self._meta.restore(file_id)
197  
198      def purge_file(self, file_id: str) -> int:
199          """Permanent delete. Returns number of chunks released for GC (best-effort)."""
200          chunk_hashes = self._meta.purge(file_id)
201          # In-memory demo doesn't ref-count chunks across files; return the list size.
202          return len(chunk_hashes)
203  
204      def restore(self, file_id: str, version_id: str) -> Version:
205          """Roll the head back to an earlier version by creating a new version that copies the old chunks."""
206          file = self._meta.get_file(file_id)
207          source = next((v for v in file.versions if v.id == version_id), None)
208          if source is None:
209              raise LookupError("version not found")
210          restored = Version(
211              id=str(uuid.uuid4())[:8],
212              size=source.size,
213              created_at=datetime.utcnow(),
214              chunk_hashes=list(source.chunk_hashes),
215              author_id=file.owner,
216          )
217          self._meta.add_version(file.id, restored)
218          return restored
219  
220      def _split(self, content: bytes) -> list[bytes]:
221          if not content:
222              return [b""]
223          return [content[i:i + self._chunk_size] for i in range(0, len(content), self._chunk_size)]
224  
225  
226  if __name__ == "__main__":
227      blob = InMemoryBlobStore()
228      meta = MetadataStore()
229      uploader = Uploader(blob, meta, chunk_size=16)   # tiny chunks for demo
230  
231      # Upload a file.
232      content_v1 = b"Hello, this is a demo of chunked upload with dedup."
233      v1 = uploader.upload("alice", "demo.txt", content_v1)
234      print(f"v1: {v1.id} — {len(v1.chunk_hashes)} chunks, {blob.unique_blob_count()} unique blobs")
235  
236      # Upload again, unchanged. Metadata creates a new version but no new blobs.
237      v2 = uploader.upload("alice", "demo.txt", content_v1)
238      print(f"v2 (same content): {blob.unique_blob_count()} unique blobs still — full dedup")
239  
240      # Edit the tail of the file. Only the changed chunks upload.
241      content_v3 = b"Hello, this is a demo of chunked upload with dedup + SMALL CHANGE"
242      v3 = uploader.upload("alice", "demo.txt", content_v3)
243      print(f"v3 (edit): {blob.unique_blob_count()} unique blobs, put_calls={blob.put_calls}")
244  
245      # Another user uploads the same v1 content. Zero new blobs.
246      before = blob.unique_blob_count()
247      uploader.upload("bob", "shared.txt", content_v1)
248      print(f"Bob's upload added {blob.unique_blob_count() - before} blobs (should be 0)")
249  
250      # Download the head and a specific version.
251      file_id = meta.list("alice")[0].id
252      head = uploader.download(file_id)
253      rollback = uploader.download(file_id, version_id=v1.id)
254      print(f"Head size: {len(head)}, v1 size: {len(rollback)}")
255      assert head == content_v3
256      assert rollback == content_v1
257  
258      # Restore v1 as the new head.
259      uploader.restore(file_id, v1.id)
260      print(f"After version restore, head == v1 content: {uploader.download(file_id) == content_v1}")
261  
262      # Trash → restore → purge.
263      uploader.delete_file(file_id)
264      assert meta.get_file(file_id).deleted
265      print(f"After delete, visible files: {len(meta.list('alice'))}; trash: {len(meta.trash('alice'))}")
266      try:
267          uploader.download(file_id)
268      except LookupError as e:
269          print(f"Download on trashed file blocked: {e}")
270  
271      uploader.undelete_file(file_id)
272      assert not meta.get_file(file_id).deleted
273      print(f"After undelete, visible files: {len(meta.list('alice'))}")
274  
275      # Purge is permanent; only chunks Bob also uses stay alive, but in-memory store doesn't ref-count.
276      purged = uploader.purge_file(file_id)
277      print(f"Purged {purged} chunk reference(s); file reachable: "
278            f"{file_id in [f.id for f in meta.list('alice', include_trash=True)]}")

Common Mistakes

✗Storing full files on every upload. At scale, duplication destroys the cost model.
✗Letting the metadata store hold the bytes. Postgres doesn't want to be a blob store.
✗Locking the whole file on write. Conflicting writes should produce a conflict copy, not block.
✗Weak hashing (MD5, CRC). Collisions are rare but catastrophic — two different files sharing a chunk corrupts both.

Key Points

✓Content-addressed storage: hash the chunk, key the blob by hash. Same bytes uploaded twice = one blob.
✓Chunking enables resumable uploads. A failed 100MB upload resumes from chunk N, not byte 0.
✓Metadata and blob stores are separate. Metadata is small and transactional; blobs are huge and eventually consistent.
✓Versions are immutable lists of chunk hashes. Restore a version by pointing file.head at the old version ID.

Related Designs