Lock-Free ProgrammingTopic 12 of 14

ProblemHardSometimes

Concurrent Bloom Filter (CAS-Based)

In one line

Bloom filter: probabilistic set membership with no false negatives, configurable false positives. Concurrent version: bits stored in atomic longs; insert sets bits via OR atomically; query reads atomically. Naturally lock-free because OR is monotonic.

What a Bloom filter is

A Bloom filter is a probabilistic data structure for set membership, invented by Burton Bloom in 1970. The query "is X in the set?" returns one of two answers. Either it's definitely not there, or it's probably there. There is no "definitely there".

That sounds like a weakness but it's the whole point. In exchange for living with a small false-positive rate, the lookup takes microseconds and uses a tiny fraction of the memory a real hash set would use. A million items at a 1% false-positive rate fit in about 1.2 MB. The same million items in a HashSet would eat somewhere between 50 and 100 MB.

So a Bloom filter is the right tool for a fast cheap "is this even worth checking?" before some expensive operation. Cassandra and RocksDB use them to skip reading SST files when the key isn't there. Web crawlers track which URLs they've visited without keeping every URL string in memory. Browsers screen URLs against malware lists. CDNs use them as cache-miss hints.

How it works

A Bloom filter has two ingredients. A bit array of size m (all zeros to start). And K hash functions, each one maps an input to one of those m positions. To add an item, hash it K times and set those K bits to 1. To check whether an item is in the set, hash it K times and look at those K bits. If any of them is still 0, the item was definitely never added. If all of them are 1, the item is probably in the set, but it might be a false positive.

A worked example with m = 16 slots and K = 3 hash functions, starting from an empty filter.

Initial state. All sixteen bits are 0.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
bit	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

After adding alice. The three hash functions return positions 2, 7, and 14, so those three bits are set to 1.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
bit	0	0	1	0	0	0	0	1	0	0	0	0	0	0	1	0

After adding bob. The three hash functions return 7, 11, and 14. Bits 7 and 14 were already 1, so only bit 11 changes.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
bit	0	0	1	0	0	0	0	1	0	0	0	1	0	0	1	0

Three lookups against this filter:

Query	Hash positions	Bits read	Verdict
`alice`	2, 7, 14	1, 1, 1	Probably in the set (correct, was added)
`carol`	1, 7, 9	0, 1, 0	Definitely not in the set (any 0 wins)
`dave`	2, 11, 14	1, 1, 1	Probably in the set — false positive, dave was never added; alice and bob between them happened to set exactly the bits dave's hashes pick

The story the example tells is the whole story of the data structure.

There are no false negatives because a bit only flips from 0 to 1 when an item is added. If any of dave's K bits had been 0, he was definitely never added. False positives come from collisions: as more items are added, more bits get set, and eventually some unrelated item's K positions all happen to be 1 already.

The three knobs to tune are the bit-array size m, the number of hash functions K, and the expected number of items n. Picking m and K for a target false-positive rate is the next section.

Sizing

Given expected entries n and target false-positive rate p:

Bits needed: m = -n * ln(p) / (ln 2)^2
Hash functions: k = (m/n) * ln 2

For one million entries at 1% false-positive rate, that works out to about 10 million bits (1.2 MB) and 7 hash functions.

A pragmatic shortcut for most cases: pick k = 7 and m = 10 * n. That lands near 1% false positives. Tune from there for lower rates.

Why the concurrent version is easy

The concurrent Bloom filter is barely different from the sequential one. The reason is that inserts are monotonic. A bit only flips one way, from 0 to 1. It never goes back.

Think about what that means for two threads racing. Two threads inserting different items set different bits, no conflict. Two threads inserting the same item set the same bits, fine, the operation is idempotent. Two threads racing on the same bit both want to set it to 1, and OR doesn't care about order. There is no scenario where one thread's write needs to be undone or sequenced before another's.

So the only change going from single-threaded to concurrent is swapping BitSet for AtomicLongArray and using an atomic OR (or a CAS retry loop on the underlying word) to flip bits. No mutex. No deadlock. The only cost is some cache-line traffic on bits that get hit by many threads at once.

Tip

When this trick generalises Any data structure where operations only push state in one direction is friendly to lock-free implementation. Bloom filter (bits go up). Max counter (value goes up). Version counter (number goes up). The hard cases are the ones that require undoing or rearranging, like linked list manipulation or hash map resize. Those need real CAS protocols and often hazard pointers for safe reclamation.

When NOT to use a Bloom filter

For workloads that delete items, regular Bloom can't help. Once a bit is set, it can't safely be cleared because some other item may also depend on that bit. Use a Counting Bloom Filter (each cell is a small counter that increments and decrements) or a Cuckoo filter (cleaner deletion semantics, similar memory).

For exact membership, use a hash set. Bloom is for the "fast approximate negative" case.

For small sets, just use a hash set. The Bloom overhead only pays off at scale.

To enumerate the members, Bloom can't do it. The original items aren't stored anywhere, just bits derived from them.

How it's actually used

The pattern in production code is always the same:

Look in the Bloom filter first. If it says "definitely not", skip the expensive operation entirely. If it says "probably in", fall through to the real lookup, which gives the truth. Tune the false-positive rate based on the cost of the expensive lookup. If the real check costs 100x more than the Bloom check, even a 10% false-positive rate still saves most of the work.

The concurrent version is also unusual in lock-free land. Most lock-free data structures (queues, hash maps, linked lists) require subtle CAS protocols and a memory reclamation scheme like hazard pointers or RCU. Bloom skips all of that. The only reason it works is that the data structure happens to be monotonic, and most useful structures aren't.

Implementations

Single-threaded Bloom filter (the foundation)

Before the concurrent version, here is the basic Bloom filter. It's a bit array + a few hash functions. To add an item, hash it K times and set those K bits. To query, hash it K times and check that ALL K bits are set. That's the whole data structure.

 1  import java.util.BitSet;
 2  import java.util.zip.CRC32;
 3  
 4  class BloomFilter {
 5      private final BitSet bits;
 6      private final int bitCount;
 7      private final int hashFunctions;
 8  
 9      public BloomFilter(int expectedEntries, double falsePositiveRate) {
10          double n = expectedEntries;
11          double p = falsePositiveRate;
12          // m = -n * ln(p) / (ln 2)^2
13          this.bitCount = (int) Math.ceil(-(n * Math.log(p)) / Math.pow(Math.log(2), 2));
14          // k = (m/n) * ln 2
15          this.hashFunctions = (int) Math.round((bitCount / n) * Math.log(2));
16          this.bits = new BitSet(bitCount);
17      }
18  
19      public void add(byte[] data) {
20          for (int i = 0; i < hashFunctions; i++) {
21              int bitIdx = (int) (hash(data, i) % bitCount);
22              bits.set(bitIdx);                          // flip 0 -> 1
23          }
24      }
25  
26      public boolean mightContain(byte[] data) {
27          for (int i = 0; i < hashFunctions; i++) {
28              int bitIdx = (int) (hash(data, i) % bitCount);
29              if (!bits.get(bitIdx)) return false;       // any 0 bit = definitely not in
30          }
31          return true;                                    // all 1 bits = probably in
32      }
33  
34      private long hash(byte[] data, int seed) {
35          CRC32 c = new CRC32();
36          c.update(seed);
37          c.update(data);
38          return c.getValue();
39      }
40  }
41  
42  // Usage:
43  BloomFilter bf = new BloomFilter(1_000_000, 0.01);    // 1M entries, 1% FP
44  bf.add("alice@example.com".getBytes());
45  bf.mightContain("alice@example.com".getBytes());      // true
46  bf.mightContain("bob@example.com".getBytes());        // false (or rare false positive)

Concurrent Bloom filter with AtomicLongArray

Now the concurrent version. Single-threaded BitSet is not thread-safe. Replace with AtomicLongArray (each long is 64 bits). Insert: atomically OR the bit into its word. Query: atomically read the word and check the bit. Naturally lock-free because OR is monotonic, bits only go from 0 to 1, never back.

 1  import java.util.concurrent.atomic.AtomicLongArray;
 2  import java.util.zip.CRC32;
 3  
 4  class ConcurrentBloomFilter {
 5      private final AtomicLongArray bits;
 6      private final int bitCount;
 7      private final int hashFunctions;
 8  
 9      public ConcurrentBloomFilter(long expectedEntries, double falsePositiveRate) {
10          double n = expectedEntries;
11          double p = falsePositiveRate;
12          this.bitCount = (int) Math.ceil(-(n * Math.log(p)) / Math.pow(Math.log(2), 2));
13          this.hashFunctions = (int) Math.round((bitCount / n) * Math.log(2));
14          this.bits = new AtomicLongArray((bitCount + 63) / 64);
15      }
16  
17      public void add(byte[] data) {
18          for (int i = 0; i < hashFunctions; i++) {
19              int bitIdx = (int) (hash(data, i) % bitCount);
20              int wordIdx = bitIdx >>> 6;
21              long mask = 1L << (bitIdx & 63);
22              // atomic OR, monotonic, no lock needed
23              bits.getAndUpdate(wordIdx, w -> w | mask);
24          }
25      }
26  
27      public boolean mightContain(byte[] data) {
28          for (int i = 0; i < hashFunctions; i++) {
29              int bitIdx = (int) (hash(data, i) % bitCount);
30              long word = bits.get(bitIdx >>> 6);
31              if ((word & (1L << (bitIdx & 63))) == 0) return false;
32          }
33          return true;
34      }
35  
36      private long hash(byte[] data, int seed) {
37          CRC32 c = new CRC32();
38          c.update(seed);
39          c.update(data);
40          return c.getValue();
41      }
42  }

Key points

•Bloom filter: array of bits + N hash functions; insert sets N bits, query checks all N
•False positive: all N bits happen to be set by other entries (configurable rate)
•False negative: never (if all N bits are 0, definitely not in set)
•Concurrent: atomic OR, naturally lock-free, monotonic (bits only go 0→1)
•Cannot delete (would require resetting bits, but other entries may use them)

Follow-up questions

▸Why is Bloom filter naturally lock-free?

Inserts only flip bits 0→1, monotonic. CAS retry just merges OR operations; intermediate states are valid. No reordering hazards. Compare to hash table inserts which involve linked-list manipulation, much harder to make lock-free.

▸Counting Bloom filter for delete?

Replace each bit with a small counter (4 bits, so per-cell add/subtract). Add increments; delete decrements. Concurrency: atomic add/sub on counter, still lock-free but more memory.

▸Where's this used in production?

Cassandra, BigTable, RocksDB use Bloom filters to skip SST file lookups for non-existent keys. Web crawlers track visited URLs. Browsers use them for malicious-URL screening. CDNs for cache-miss hints.

▸What about false positive rate at scale?

Pre-size for the expected entries. Insertion beyond capacity inflates FP rate. Production systems either: (a) overprovision, (b) rotate filters, or (c) use Cuckoo filter (counting BF replacement that supports delete cleanly).

Gotchas

!False positive rate degrades as filter fills, pre-size for expected load
!Hash-function quality matters, bad hashes → uneven bit distribution → worse FP rate
!Cannot delete, once a bit is set it can't be safely cleared
!Multiple Bloom filters with different hash seeds aren't interoperable
!Use a counting Bloom filter or Cuckoo filter when delete is required

Concurrent Bloom Filter (CAS-Based)

In one line

What a Bloom filter is

How it works

A worked example with m = 16 slots and K = 3 hash functions, starting from an empty filter.

Initial state. All sixteen bits are 0.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
bit	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

After adding alice. The three hash functions return positions 2, 7, and 14, so those three bits are set to 1.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
bit	0	0	1	0	0	0	0	1	0	0	0	0	0	0	1	0

After adding bob. The three hash functions return 7, 11, and 14. Bits 7 and 14 were already 1, so only bit 11 changes.

index	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
bit	0	0	1	0	0	0	0	1	0	0	0	1	0	0	1	0

Three lookups against this filter:

Query	Hash positions	Bits read	Verdict
`alice`	2, 7, 14	1, 1, 1	Probably in the set (correct, was added)
`carol`	1, 7, 9	0, 1, 0	Definitely not in the set (any 0 wins)
`dave`	2, 11, 14	1, 1, 1	Probably in the set — false positive, dave was never added; alice and bob between them happened to set exactly the bits dave's hashes pick

The story the example tells is the whole story of the data structure.

The three knobs to tune are the bit-array size m, the number of hash functions K, and the expected number of items n. Picking m and K for a target false-positive rate is the next section.

Sizing

Given expected entries n and target false-positive rate p:

Bits needed: m = -n * ln(p) / (ln 2)^2
Hash functions: k = (m/n) * ln 2

For one million entries at 1% false-positive rate, that works out to about 10 million bits (1.2 MB) and 7 hash functions.

A pragmatic shortcut for most cases: pick k = 7 and m = 10 * n. That lands near 1% false positives. Tune from there for lower rates.

Why the concurrent version is easy

The concurrent Bloom filter is barely different from the sequential one. The reason is that inserts are monotonic. A bit only flips one way, from 0 to 1. It never goes back.

Tip

When NOT to use a Bloom filter

For exact membership, use a hash set. Bloom is for the "fast approximate negative" case.

For small sets, just use a hash set. The Bloom overhead only pays off at scale.

To enumerate the members, Bloom can't do it. The original items aren't stored anywhere, just bits derived from them.

How it's actually used

The pattern in production code is always the same:

Implementations

Single-threaded Bloom filter (the foundation)

 1  import java.util.BitSet;
 2  import java.util.zip.CRC32;
 3  
 4  class BloomFilter {
 5      private final BitSet bits;
 6      private final int bitCount;
 7      private final int hashFunctions;
 8  
 9      public BloomFilter(int expectedEntries, double falsePositiveRate) {
10          double n = expectedEntries;
11          double p = falsePositiveRate;
12          // m = -n * ln(p) / (ln 2)^2
13          this.bitCount = (int) Math.ceil(-(n * Math.log(p)) / Math.pow(Math.log(2), 2));
14          // k = (m/n) * ln 2
15          this.hashFunctions = (int) Math.round((bitCount / n) * Math.log(2));
16          this.bits = new BitSet(bitCount);
17      }
18  
19      public void add(byte[] data) {
20          for (int i = 0; i < hashFunctions; i++) {
21              int bitIdx = (int) (hash(data, i) % bitCount);
22              bits.set(bitIdx);                          // flip 0 -> 1
23          }
24      }
25  
26      public boolean mightContain(byte[] data) {
27          for (int i = 0; i < hashFunctions; i++) {
28              int bitIdx = (int) (hash(data, i) % bitCount);
29              if (!bits.get(bitIdx)) return false;       // any 0 bit = definitely not in
30          }
31          return true;                                    // all 1 bits = probably in
32      }
33  
34      private long hash(byte[] data, int seed) {
35          CRC32 c = new CRC32();
36          c.update(seed);
37          c.update(data);
38          return c.getValue();
39      }
40  }
41  
42  // Usage:
43  BloomFilter bf = new BloomFilter(1_000_000, 0.01);    // 1M entries, 1% FP
44  bf.add("alice@example.com".getBytes());
45  bf.mightContain("alice@example.com".getBytes());      // true
46  bf.mightContain("bob@example.com".getBytes());        // false (or rare false positive)

Concurrent Bloom filter with AtomicLongArray

 1  import java.util.concurrent.atomic.AtomicLongArray;
 2  import java.util.zip.CRC32;
 3  
 4  class ConcurrentBloomFilter {
 5      private final AtomicLongArray bits;
 6      private final int bitCount;
 7      private final int hashFunctions;
 8  
 9      public ConcurrentBloomFilter(long expectedEntries, double falsePositiveRate) {
10          double n = expectedEntries;
11          double p = falsePositiveRate;
12          this.bitCount = (int) Math.ceil(-(n * Math.log(p)) / Math.pow(Math.log(2), 2));
13          this.hashFunctions = (int) Math.round((bitCount / n) * Math.log(2));
14          this.bits = new AtomicLongArray((bitCount + 63) / 64);
15      }
16  
17      public void add(byte[] data) {
18          for (int i = 0; i < hashFunctions; i++) {
19              int bitIdx = (int) (hash(data, i) % bitCount);
20              int wordIdx = bitIdx >>> 6;
21              long mask = 1L << (bitIdx & 63);
22              // atomic OR, monotonic, no lock needed
23              bits.getAndUpdate(wordIdx, w -> w | mask);
24          }
25      }
26  
27      public boolean mightContain(byte[] data) {
28          for (int i = 0; i < hashFunctions; i++) {
29              int bitIdx = (int) (hash(data, i) % bitCount);
30              long word = bits.get(bitIdx >>> 6);
31              if ((word & (1L << (bitIdx & 63))) == 0) return false;
32          }
33          return true;
34      }
35  
36      private long hash(byte[] data, int seed) {
37          CRC32 c = new CRC32();
38          c.update(seed);
39          c.update(data);
40          return c.getValue();
41      }
42  }

Key points

•Bloom filter: array of bits + N hash functions; insert sets N bits, query checks all N
•False positive: all N bits happen to be set by other entries (configurable rate)
•False negative: never (if all N bits are 0, definitely not in set)
•Concurrent: atomic OR, naturally lock-free, monotonic (bits only go 0→1)
•Cannot delete (would require resetting bits, but other entries may use them)

Follow-up questions

▸Why is Bloom filter naturally lock-free?

▸Counting Bloom filter for delete?

Replace each bit with a small counter (4 bits, so per-cell add/subtract). Add increments; delete decrements. Concurrency: atomic add/sub on counter, still lock-free but more memory.

▸Where's this used in production?

▸What about false positive rate at scale?

Gotchas

!False positive rate degrades as filter fills, pre-size for expected load
!Hash-function quality matters, bad hashes → uneven bit distribution → worse FP rate
!Cannot delete, once a bit is set it can't be safely cleared
!Multiple Bloom filters with different hash seeds aren't interoperable
!Use a counting Bloom filter or Cuckoo filter when delete is required

Concurrent Bloom Filter (CAS-Based)

What a Bloom filter is

How it works

Sizing

Why the concurrent version is easy

When NOT to use a Bloom filter

How it's actually used

Implementations

Key points

Follow-up questions

Gotchas

Related reading

Concurrent Bloom Filter (CAS-Based)

What a Bloom filter is

How it works

Sizing

Why the concurrent version is easy

When NOT to use a Bloom filter

How it's actually used

Implementations

Key points

Follow-up questions

Gotchas

Related reading