Skip to Content

Bloom filter

What

  • Answer the problem of “is X in the set?” using a space-efficient probabilistic data structure.
  • Bloom filter will give you “definitely no” or “probably yes”, e.g. False positive matches are possible, but false negatives are not.
  • This is the trade-off using less memory (than a hash table counterpart) in exchange of lower accuracy.
  • Drawback: - Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives.

Where can it be used

  • Avoid “one-hit-wonder”. Most pages are only accessed once. Use bloom filter to decide whether we want to cache a request to enhance the overall hit-rate.

How

from ByteByteGo video

  • The bloom filter is the bit array
  • In the example in the graph, we use different hash functions to transform an input to 3 numbers. And then we mark position representing those number to true in the array.
  • Same input will always map to same 3 different position, so does different input could have. Hence the false positive could occur.
  • However, since how it’s done, if an input didn’t transform to 3 numbers that has all been set in the array, that input can’t be seen already. (no false negative)
  • So the length of array, how many number to transform for a specific input will be the trade-off for lower chance of false positive.
Last updated on