Bloom filter
- from src: wiki
- from ByteByteGo
What
- Answer the problem of “is X in the set?” using a space-efficient probabilistic data structure.
- Bloom filter will give you “definitely no” or “probably yes”, e.g. False positive matches are possible, but false negatives are not.
- This is the trade-off using less memory (than a hash table counterpart) in exchange of lower accuracy.
- Drawback: - Elements can be added to the set, but not removed (though this can be addressed with the counting Bloom filter variant); the more items added, the larger the probability of false positives.
Where can it be used
- Avoid “one-hit-wonder”. Most pages are only accessed once. Use bloom filter to decide whether we want to cache a request to enhance the overall hit-rate.
How

- The bloom filter is the bit array
- In the example in the graph, we use different hash functions to transform an input to 3 numbers. And then we mark position representing those number to true in the array.
- Same input will always map to same 3 different position, so does different input could have. Hence the false positive could occur.
- However, since how it’s done, if an input didn’t transform to 3 numbers that has all been set in the array, that input can’t be seen already. (no false negative)
- So the length of array, how many number to transform for a specific input will be the trade-off for lower chance of false positive.
Last updated on