LFU Cache
Least-frequently-used eviction in O(1). The trick is a doubly-linked list of frequency buckets — each bucket is itself a DLL of keys at that frequency.
Key Abstractions
Key, value, frequency count, optional expiresAt for TTL
DLL of entries sharing the same access frequency. MRU at head, LRU at tail for O(1) tie-break.
Public API. Maintains the key-to-entry index, orchestrates promotions, and tracks hit/miss/eviction stats.
Class Diagram
The Key Insight
Naive LFU keeps a count per key and scans for the minimum when evicting. That's O(n) per eviction. The real insight: group entries by frequency into buckets, and keep the buckets themselves ordered. Evicting becomes "grab the min-frequency bucket, pop its tail."
The second layer: within a single frequency bucket, ties must be broken by recency. A pure frequency comparison is ambiguous when two keys share the minimum. Making each bucket a doubly-linked list (MRU at head, LRU at tail) kills two birds — O(1) eviction of the least-recent-within-least-frequent, and O(1) promotion when a key's frequency bumps up.
Requirements
Functional
get(key)returns the value and increments the key's access frequencyput(key, value)inserts or updates; updating also increments frequency- When capacity is exceeded, evict the least-frequently-used key
- Tie-break among equal-frequency keys by least-recently-used
Non-Functional
- O(1) time for get and put
- Thread-safe under concurrent access
- Frequency must be preserved on put-update (updating doesn't wipe history)
Design Decisions
Why nested doubly-linked lists?
A single DLL orders by recency only — LRU's model. A heap ordered by frequency gives O(log n) per op and makes decrement tricky. Nested DLLs split the problem: the outer level indexes by frequency, the inner level orders by recency within a frequency. Both dimensions get O(1).
Why track minFreq explicitly?
Scanning buckets for "the smallest frequency with at least one entry" is O(F) where F is the number of distinct frequencies. With minFreq updated on every promotion, eviction is a direct bucket lookup. The invariant: when minFreq's bucket empties after a promotion, the new entry is at minFreq + 1, so bumping minFreq by 1 is always correct.
Why LRU tie-break instead of random or FIFO?
If two keys have frequency 1, pure LFU offers no way to pick. LRU is the industry default — it matches human intuition ("the one someone touched more recently is more likely to be touched again") and makes sequential-scan workloads bearable. Bucket-as-DLL gives LRU tie-break for free.
Why not start frequency at 0?
Frequency 1 means "accessed once — the put itself." Starting at 0 would require a special case on the first get to check "was this inserted yet or is it a ghost." Starting at 1 keeps the arithmetic consistent.
Interview Follow-ups
- "How does this compare to LRU?" LRU is simpler (one DLL, no freq tracking) and wins on scan-heavy workloads. LFU wins when access patterns are stable — the top-10% hot keys stay hot. Facebook's CDN uses LFU variants; Redis offers both.
- "What about LFU decay?" Without decay, keys that were hot last month sit forever because their freq count is huge. Solutions: age entries by halving counts periodically (LFU-Aging), or use a sliding-window count (WLFU).
- "How would you make this distributed?" Shard keys by consistent hash to nodes. Each node runs local LFU. Global LFU is usually not worth the coordination cost — approximate LFU with bloom-filter-based access counters scales better.
- "How does Caffeine's Window-TinyLFU compare?" It's a hybrid: an LRU admission window filters one-hit wonders, then TinyLFU (a count-min sketch) decides whether to admit into the main region. Beats pure LFU on mixed workloads.
Code Implementation
1 from __future__ import annotations
2 from datetime import datetime, timedelta
3 from threading import RLock
4
5
6 class Entry:
7 __slots__ = ("key", "value", "freq", "expires_at", "prev", "next")
8
9 def __init__(self, key=None, value=None, expires_at: datetime | None = None):
10 self.key = key
11 self.value = value
12 self.freq = 1
13 self.expires_at = expires_at
14 self.prev: "Entry | None" = None
15 self.next: "Entry | None" = None
16
17 def is_expired(self, now: datetime) -> bool:
18 return self.expires_at is not None and now >= self.expires_at
19
20
21 class FreqBucket:
22 """DLL of entries with the same frequency. MRU at head, LRU at tail."""
23
24 def __init__(self):
25 self._head = Entry()
26 self._tail = Entry()
27 self._head.next = self._tail
28 self._tail.prev = self._head
29 self._size = 0
30
31 def add_first(self, entry: Entry) -> None:
32 entry.prev = self._head
33 entry.next = self._head.next
34 self._head.next.prev = entry
35 self._head.next = entry
36 self._size += 1
37
38 def remove(self, entry: Entry) -> None:
39 entry.prev.next = entry.next
40 entry.next.prev = entry.prev
41 entry.prev = entry.next = None
42 self._size -= 1
43
44 def remove_last(self) -> Entry:
45 if self._size == 0:
46 raise IndexError("empty bucket")
47 victim = self._tail.prev
48 self.remove(victim)
49 return victim
50
51 def is_empty(self) -> bool:
52 return self._size == 0
53
54
55 class LFUCache:
56 """O(1) LFU with LRU tie-break. Thread-safe via RLock. Optional per-key TTL."""
57
58 def __init__(self, capacity: int):
59 if capacity <= 0:
60 raise ValueError("capacity must be positive")
61 self._capacity = capacity
62 self._entries: dict = {}
63 self._buckets: dict[int, FreqBucket] = {}
64 self._min_freq = 0
65 self._lock = RLock()
66 # Observability — hits/misses for tuning, evictions/expirations for capacity signals.
67 self._hits = 0
68 self._misses = 0
69 self._evictions = 0
70 self._expirations = 0
71
72 def get(self, key, now: datetime | None = None):
73 at = now or datetime.utcnow()
74 with self._lock:
75 entry = self._entries.get(key)
76 if entry is None:
77 self._misses += 1
78 return None
79 if entry.is_expired(at):
80 # Lazy expiration — drop and report a miss.
81 self._remove_entry(entry)
82 self._expirations += 1
83 self._misses += 1
84 return None
85 self._promote(entry)
86 self._hits += 1
87 return entry.value
88
89 def put(self, key, value, ttl: timedelta | None = None, now: datetime | None = None) -> None:
90 at = now or datetime.utcnow()
91 expires_at = at + ttl if ttl else None
92 with self._lock:
93 if self._capacity == 0:
94 return
95
96 existing = self._entries.get(key)
97 if existing is not None:
98 existing.value = value
99 existing.expires_at = expires_at
100 self._promote(existing)
101 return
102
103 if len(self._entries) >= self._capacity:
104 self._evict()
105
106 entry = Entry(key, value, expires_at=expires_at)
107 self._entries[key] = entry
108 bucket = self._buckets.setdefault(1, FreqBucket())
109 bucket.add_first(entry)
110 self._min_freq = 1
111
112 def stats(self) -> dict:
113 with self._lock:
114 total = self._hits + self._misses
115 hit_rate = self._hits / total if total else 0.0
116 return {
117 "hits": self._hits, "misses": self._misses,
118 "evictions": self._evictions, "expirations": self._expirations,
119 "hit_rate": round(hit_rate, 3),
120 }
121
122 def _remove_entry(self, entry: Entry) -> None:
123 """Drop an entry from both the bucket and the index. Used on expiration."""
124 bucket = self._buckets[entry.freq]
125 bucket.remove(entry)
126 if bucket.is_empty():
127 del self._buckets[entry.freq]
128 if self._min_freq == entry.freq:
129 self._min_freq = min(self._buckets, default=0)
130 del self._entries[entry.key]
131
132 def _promote(self, entry: Entry) -> None:
133 old_freq = entry.freq
134 old_bucket = self._buckets[old_freq]
135 old_bucket.remove(entry)
136
137 # If we just emptied the min-freq bucket, bump minFreq.
138 if old_bucket.is_empty():
139 del self._buckets[old_freq]
140 if self._min_freq == old_freq:
141 self._min_freq = old_freq + 1
142
143 entry.freq += 1
144 new_bucket = self._buckets.setdefault(entry.freq, FreqBucket())
145 new_bucket.add_first(entry)
146
147 def _evict(self) -> None:
148 bucket = self._buckets[self._min_freq]
149 victim = bucket.remove_last()
150 del self._entries[victim.key]
151 self._evictions += 1
152 if bucket.is_empty():
153 del self._buckets[self._min_freq]
154
155
156 if __name__ == "__main__":
157 cache = LFUCache(capacity=3)
158 cache.put("a", 1)
159 cache.put("b", 2)
160 cache.put("c", 3)
161 cache.get("a") # a: freq 2
162 cache.get("a") # a: freq 3
163 cache.get("b") # b: freq 2
164 # c is still at freq 1 — it gets evicted when d arrives.
165 cache.put("d", 4)
166
167 print(f"a = {cache.get('a')}") # 4 after get
168 print(f"b = {cache.get('b')}") # still there
169 print(f"c = {cache.get('c')}") # None, evicted
170 print(f"d = {cache.get('d')}") # there
171
172 assert cache.get("c") is None, "c should have been evicted"
173 assert cache.get("a") == 1
174 print(f"stats: {cache.stats()}")
175
176 # TTL: entry expires between two gets.
177 ttl_cache = LFUCache(capacity=2)
178 base = datetime(2026, 4, 14, 10, 0, 0)
179 ttl_cache.put("k", "v", ttl=timedelta(seconds=10), now=base)
180 assert ttl_cache.get("k", now=base + timedelta(seconds=5)) == "v"
181 assert ttl_cache.get("k", now=base + timedelta(seconds=15)) is None
182 stats = ttl_cache.stats()
183 print(f"TTL stats: {stats}")
184 assert stats["expirations"] == 1 and stats["hits"] == 1 and stats["misses"] == 1
185 print("All assertions passed.")Common Mistakes
- ✗Using a min-heap keyed by frequency. O(log n) per op, and updating a heap entry is painful.
- ✗Forgetting to update minFreq when a bucket empties. Next eviction hits the wrong bucket.
- ✗Not breaking frequency ties with LRU. Pure LFU is ambiguous when two keys share min frequency.
- ✗Resetting frequency to 1 on put of an existing key. That's not an update — it's a wipe.
Key Points
- ✓O(1) for both get and put using nested DLLs — one list of buckets, each bucket is a list of keys.
- ✓Track minFreq explicitly so eviction lands on the right bucket instantly, no scanning.
- ✓On tie within a frequency bucket, evict the least recently used entry — that's why buckets are DLLs not sets.
- ✓Promotion: increment frequency, move entry from bucket f to bucket f+1. Create bucket f+1 if missing.
- ✓TTL is lazy — get() removes an expired entry and reports a miss; no background reaper needed for correctness.
- ✓Stats (hits, misses, evictions, expirations) are the signal for tuning capacity and TTL in production.