Memcached
The simplest distributed cache that actually works at scale
Use Cases
Architecture
Most caching debates start with "Redis or Memcached?" and the answer people want is always Redis. But if the requirement is a fast key-value cache, Memcached is the better tool. It's been running at the core of Facebook, Twitter, YouTube, and Wikipedia for over two decades. Brad Fitzpatrick built it for LiveJournal in 2003, and the design philosophy hasn't changed since. Key-value strings with TTL expiration. That's it. No data structures, no scripting, no persistence. That constraint is the whole point. It makes Memcached one of the most predictable and operationally boring (in the best way) components in a stack.
How It Works Internally
Memcached manages memory through a slab allocator. On startup, it carves available memory into pages (default 1MB each) and assigns them to slab classes. Each slab class handles items within a specific size range, with sizes growing by a configurable factor (default 1.25x). So class 1 handles items up to 96 bytes, class 2 up to 120 bytes, and so on up to 1MB. When an item is stored, Memcached picks the smallest slab class it fits into. This kills malloc/free fragmentation, which is great. The tradeoff? Internal fragmentation when values don't fill their slab class. Memory gets wasted. Accept this and tune around it.
Each slab class maintains its own LRU (Least Recently Used) linked list. This is a detail that trips people up: eviction happens per slab class, not globally. One slab class can be evicting like crazy while another sits half-empty. The hash table used a global lock in older versions, but modern builds use fine-grained per-bucket locking. For request processing, a dispatcher thread accepts connections and round-robins them across worker threads. Nothing fancy, but it lets the system saturate all CPU cores, which is something Redis cannot do with its single-threaded model.
Consistent hashing lives entirely in the client library. It maps keys onto a ring of 2^32 points. Each server gets placed at multiple virtual nodes on the ring (typically 100-200 per server), and keys route to the nearest server clockwise. When a server dies, only about 1/N of keys get remapped. Compare that to modulo-based hashing where losing one server remaps almost everything.
Production Architecture
In production, Memcached runs as a pool of independent servers. They don't talk to each other. At all. Facebook's 2013 NSDI paper ("Scaling Memcache at Facebook") describes how they built mcrouter, a proxy layer that handles connection pooling, routing, cross-region replication, and failover. All the stuff Memcached itself deliberately ignores. They ran regional pools within each datacenter and replicated across regions using invalidation daemons that tailed MySQL's binlog and broadcast delete commands to Memcached clusters.
For a custom deployment: run Memcached on dedicated machines or containers with locked memory (-k flag) to prevent swapping. Set the maximum item size (-I) based on the actual workload. The default 1MB is almost always too large and wastes slab space. For very high request rates, use UDP for gets and TCP for sets to cut down on connection overhead. Facebook proved this works at scale.
Capacity Planning
A single Memcached instance on a 64GB server with 8 cores can push 500K+ GET operations per second. That's a lot of headroom for most teams. Plan memory allocation around the value size distribution. Run stats slabs to see which slab classes get the most traffic and tune the growth factor (-f) accordingly. For reference, Facebook allocated roughly 28TB of total Memcached capacity to serve billions of requests per second, with a cache hit rate above 99%.
Keep an eye on evictions, get_hits, get_misses, and bytes_read/bytes_written per server. If the hit ratio drops below 95%, either more memory is needed or the key design is wrong. Track curr_connections against the -c max connection limit (default 1024) and bump it for deployments with lots of clients. Watch for slab class imbalance through stats slabs. The slab_reassign and slab_automove features (added in 1.4.11) can rebalance memory between slab classes automatically, which eliminates manual babysitting.
Failure Scenarios
Scenario 1: Thundering herd on cold cache. A Memcached server restarts or a batch of keys expire at the same time. Suddenly thousands of requests miss the cache simultaneously and pile onto the database. At Facebook scale, this could mean 100,000 concurrent queries hitting MySQL for the same popular row. Detection means watching for a hit ratio drop paired with a database query spike. The fix: implement lease-based protection where the first client to miss a key gets an exclusive lease to repopulate it, while everyone else either waits or gets a stale value. Facebook's McSqueal system also pre-warmed caches before routing traffic to new servers.
Scenario 2: Consistent hash ring goes out of sync during a rolling deploy. The team is rolling out new app servers, and the old and new instances have slightly different server lists in their hash ring configs. Now the same key routes to different Memcached servers depending on which app server handles the request. The result is duplicate cache entries, higher miss rates, and stale data. Detect it by monitoring per-server request distribution for sudden imbalances. Prevent it by using a config service (ZooKeeper, Consul) to update the server list atomically across all clients, or use mcrouter to centralize routing decisions so individual app servers don't make their own choices.
Pros
- • Dead simple. Just key-value strings, nothing to overthink
- • Multi-threaded architecture for high throughput
- • Consistent hashing makes horizontal scaling straightforward
- • Minimal memory overhead per item
- • Mature and battle-tested at massive scale
Cons
- • No persistence. Data gone on restart, full stop
- • Only supports string values (no rich data structures)
- • No built-in replication
- • No pub/sub or advanced features
- • Limited to key-value operations
When to use
- • Simple caching of serialized objects or query results
- • Need multi-threaded performance for high concurrency
- • Want a lightweight, low-overhead cache layer
- • Already have a durable primary store
When NOT to use
- • Need data persistence or durability
- • Require rich data structures (use Redis)
- • Need pub/sub or stream processing
- • Want built-in replication and failover
Key Points
- •Multi-threaded architecture uses one thread per connection, spreading work across all CPU cores. This is a fundamentally different model from Redis's single-threaded approach
- •The slab allocator pre-allocates memory into fixed-size classes (64B, 128B, 256B...) to eliminate malloc fragmentation, but the cost is wasted space when values don't fill their slab class
- •Zero persistence by design. Memcached is a pure cache. Treat every key as ephemeral, and make sure the system handles a 100% cache miss gracefully
- •All sharding is client-side via consistent hashing. The servers know nothing about each other, which makes the cluster trivially simple but also means there's no automatic failover
- •Cache stampede (thundering herd) is the biggest production risk. Use lease-based mechanisms, probabilistic early recomputation, or lock-based repopulation to keep the backend from getting crushed
- •Facebook ran the largest Memcached deployment ever built: 28TB of cache across thousands of servers handling billions of requests per second
Common Mistakes
- ✗Expecting persistence or durability. Memcached is volatile by design. Any code path that breaks on cache miss is a critical bug, not a minor inconvenience
- ✗Not sizing slab classes correctly. The default growth factor of 1.25 creates classes at 96B, 120B, 152B, etc. Items land in the next-largest class, wasting up to 20% of memory
- ✗Ignoring thundering herd on cold cache. When a popular key expires, hundreds of concurrent requests slam the database at once. Use stale-while-revalidate or probabilistic expiry
- ✗Not monitoring eviction stats. High evictions in a specific slab class mean that size range needs more memory, not that the server is out of memory overall
- ✗Adding more servers without updating the hash ring everywhere at once. Inconsistent ring configs across app servers cause misses and stale reads during rollout