LFU Cache
Least-frequently-used eviction in O(1). The trick is a doubly-linked list of frequency buckets — each bucket is itself a DLL of keys at that frequency.
Key Abstractions
Key, value, frequency count, optional expiresAt for TTL
DLL of entries sharing the same access frequency. MRU at head, LRU at tail for O(1) tie-break.
Public API. Maintains the key-to-entry index, orchestrates promotions, and tracks hit/miss/eviction stats.
Class Diagram
The Key Insight
Naive LFU keeps a count per key and scans for the minimum when evicting. That's O(n) per eviction. The real insight: group entries by frequency into buckets, and keep the buckets themselves ordered. Evicting becomes "grab the min-frequency bucket, pop its tail."
The second layer: within a single frequency bucket, ties must be broken by recency. A pure frequency comparison is ambiguous when two keys share the minimum. Making each bucket a doubly-linked list (MRU at head, LRU at tail) kills two birds — O(1) eviction of the least-recent-within-least-frequent, and O(1) promotion when a key's frequency bumps up.
Requirements
Functional
get(key)returns the value and increments the key's access frequencyput(key, value)inserts or updates; updating also increments frequency- When capacity is exceeded, evict the least-frequently-used key
- Tie-break among equal-frequency keys by least-recently-used
Non-Functional
- O(1) time for get and put
- Thread-safe under concurrent access
- Frequency must be preserved on put-update (updating doesn't wipe history)
Design Decisions
Why nested doubly-linked lists?
A single DLL orders by recency only — LRU's model. A heap ordered by frequency gives O(log n) per op and makes decrement tricky. Nested DLLs split the problem: the outer level indexes by frequency, the inner level orders by recency within a frequency. Both dimensions get O(1).
Why track minFreq explicitly?
Scanning buckets for "the smallest frequency with at least one entry" is O(F) where F is the number of distinct frequencies. With minFreq updated on every promotion, eviction is a direct bucket lookup. The invariant: when minFreq's bucket empties after a promotion, the new entry is at minFreq + 1, so bumping minFreq by 1 is always correct.
Why LRU tie-break instead of random or FIFO?
If two keys have frequency 1, pure LFU offers no way to pick. LRU is the industry default — it matches human intuition ("the one someone touched more recently is more likely to be touched again") and makes sequential-scan workloads bearable. Bucket-as-DLL gives LRU tie-break for free.
Why not start frequency at 0?
Frequency 1 means "accessed once — the put itself." Starting at 0 would require a special case on the first get to check "was this inserted yet or is it a ghost." Starting at 1 keeps the arithmetic consistent.
Interview Follow-ups
- "How does this compare to LRU?" LRU is simpler (one DLL, no freq tracking) and wins on scan-heavy workloads. LFU wins when access patterns are stable — the top-10% hot keys stay hot. Facebook's CDN uses LFU variants; Redis offers both.
- "What about LFU decay?" Without decay, keys that were hot last month sit forever because their freq count is huge. Solutions: age entries by halving counts periodically (LFU-Aging), or use a sliding-window count (WLFU).
- "How would you make this distributed?" Shard keys by consistent hash to nodes. Each node runs local LFU. Global LFU is usually not worth the coordination cost — approximate LFU with bloom-filter-based access counters scales better.
- "How does Caffeine's Window-TinyLFU compare?" It's a hybrid: an LRU admission window filters one-hit wonders, then TinyLFU (a count-min sketch) decides whether to admit into the main region. Beats pure LFU on mixed workloads.
Code Implementation
from __future__ import annotations
from datetime import datetime, timedelta
from threading import RLock
class Entry:
__slots__ = ("key", "value", "freq", "expires_at", "prev", "next")
def __init__(self, key=None, value=None, expires_at: datetime | None = None):
self.key = key
self.value = value
self.freq = 1
self.expires_at = expires_at
self.prev: "Entry | None" = None
self.next: "Entry | None" = None
def is_expired(self, now: datetime) -> bool:
return self.expires_at is not None and now >= self.expires_at
class FreqBucket:
"""DLL of entries with the same frequency. MRU at head, LRU at tail."""
def __init__(self):
self._head = Entry()
self._tail = Entry()
self._head.next = self._tail
self._tail.prev = self._head
self._size = 0
def add_first(self, entry: Entry) -> None:
entry.prev = self._head
entry.next = self._head.next
self._head.next.prev = entry
self._head.next = entry
self._size += 1
def remove(self, entry: Entry) -> None:
entry.prev.next = entry.next
entry.next.prev = entry.prev
entry.prev = entry.next = None
self._size -= 1
def remove_last(self) -> Entry:
if self._size == 0:
raise IndexError("empty bucket")
victim = self._tail.prev
self.remove(victim)
return victim
def is_empty(self) -> bool:
return self._size == 0
class LFUCache:
"""O(1) LFU with LRU tie-break. Thread-safe via RLock. Optional per-key TTL."""
def __init__(self, capacity: int):
if capacity <= 0:
raise ValueError("capacity must be positive")
self._capacity = capacity
self._entries: dict = {}
self._buckets: dict[int, FreqBucket] = {}
self._min_freq = 0
self._lock = RLock()
# Observability — hits/misses for tuning, evictions/expirations for capacity signals.
self._hits = 0
self._misses = 0
self._evictions = 0
self._expirations = 0
def get(self, key, now: datetime | None = None):
at = now or datetime.utcnow()
with self._lock:
entry = self._entries.get(key)
if entry is None:
self._misses += 1
return None
if entry.is_expired(at):
# Lazy expiration — drop and report a miss.
self._remove_entry(entry)
self._expirations += 1
self._misses += 1
return None
self._promote(entry)
self._hits += 1
return entry.value
def put(self, key, value, ttl: timedelta | None = None, now: datetime | None = None) -> None:
at = now or datetime.utcnow()
expires_at = at + ttl if ttl else None
with self._lock:
if self._capacity == 0:
return
existing = self._entries.get(key)
if existing is not None:
existing.value = value
existing.expires_at = expires_at
self._promote(existing)
return
if len(self._entries) >= self._capacity:
self._evict()
entry = Entry(key, value, expires_at=expires_at)
self._entries[key] = entry
bucket = self._buckets.setdefault(1, FreqBucket())
bucket.add_first(entry)
self._min_freq = 1
def stats(self) -> dict:
with self._lock:
total = self._hits + self._misses
hit_rate = self._hits / total if total else 0.0
return {
"hits": self._hits, "misses": self._misses,
"evictions": self._evictions, "expirations": self._expirations,
"hit_rate": round(hit_rate, 3),
}
def _remove_entry(self, entry: Entry) -> None:
"""Drop an entry from both the bucket and the index. Used on expiration."""
bucket = self._buckets[entry.freq]
bucket.remove(entry)
if bucket.is_empty():
del self._buckets[entry.freq]
if self._min_freq == entry.freq:
self._min_freq = min(self._buckets, default=0)
del self._entries[entry.key]
def _promote(self, entry: Entry) -> None:
old_freq = entry.freq
old_bucket = self._buckets[old_freq]
old_bucket.remove(entry)
# If we just emptied the min-freq bucket, bump minFreq.
if old_bucket.is_empty():
del self._buckets[old_freq]
if self._min_freq == old_freq:
self._min_freq = old_freq + 1
entry.freq += 1
new_bucket = self._buckets.setdefault(entry.freq, FreqBucket())
new_bucket.add_first(entry)
def _evict(self) -> None:
bucket = self._buckets[self._min_freq]
victim = bucket.remove_last()
del self._entries[victim.key]
self._evictions += 1
if bucket.is_empty():
del self._buckets[self._min_freq]
if __name__ == "__main__":
cache = LFUCache(capacity=3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
cache.get("a") # a: freq 2
cache.get("a") # a: freq 3
cache.get("b") # b: freq 2
# c is still at freq 1 — it gets evicted when d arrives.
cache.put("d", 4)
print(f"a = {cache.get('a')}") # 4 after get
print(f"b = {cache.get('b')}") # still there
print(f"c = {cache.get('c')}") # None, evicted
print(f"d = {cache.get('d')}") # there
assert cache.get("c") is None, "c should have been evicted"
assert cache.get("a") == 1
print(f"stats: {cache.stats()}")
# TTL: entry expires between two gets.
ttl_cache = LFUCache(capacity=2)
base = datetime(2026, 4, 14, 10, 0, 0)
ttl_cache.put("k", "v", ttl=timedelta(seconds=10), now=base)
assert ttl_cache.get("k", now=base + timedelta(seconds=5)) == "v"
assert ttl_cache.get("k", now=base + timedelta(seconds=15)) is None
stats = ttl_cache.stats()
print(f"TTL stats: {stats}")
assert stats["expirations"] == 1 and stats["hits"] == 1 and stats["misses"] == 1
print("All assertions passed.")Common Mistakes
- ✗Using a min-heap keyed by frequency. O(log n) per op, and updating a heap entry is painful.
- ✗Forgetting to update minFreq when a bucket empties. Next eviction hits the wrong bucket.
- ✗Not breaking frequency ties with LRU. Pure LFU is ambiguous when two keys share min frequency.
- ✗Resetting frequency to 1 on put of an existing key. That's not an update — it's a wipe.
Key Points
- ✓O(1) for both get and put using nested DLLs — one list of buckets, each bucket is a list of keys.
- ✓Track minFreq explicitly so eviction lands on the right bucket instantly, no scanning.
- ✓On tie within a frequency bucket, evict the least recently used entry — that's why buckets are DLLs not sets.
- ✓Promotion: increment frequency, move entry from bucket f to bucket f+1. Create bucket f+1 if missing.
- ✓TTL is lazy — get() removes an expired entry and reports a miss; no background reaper needed for correctness.
- ✓Stats (hits, misses, evictions, expirations) are the signal for tuning capacity and TTL in production.