Daisy Bloom Filters
Ioana O. Bercea, Jakob B{\ae}k Tejs Houen, and Rasmus Pagh

TL;DR
This paper introduces the Daisy Bloom filter, a new data structure that improves space efficiency and operational speed over traditional Bloom filters, especially for data with known distribution properties.
Contribution
The paper provides a lower bound on the expected space for distribution-aware Bloom filters and presents a new Daisy Bloom filter that is faster and more space-efficient.
Findings
Daisy Bloom filters outperform standard Bloom filters in space usage.
The proposed filter achieves worst-case constant time operations.
The filter maintains a low false positive rate with high probability.
Abstract
A filter is a widely used data structure for storing an approximation of a given set of elements from some universe (a countable set).It represents a superset that is ''close to '' in the sense that for , the probability that is bounded by some . The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in with probability close to 1. Then it would make sense to always include them in , saving space by not having to represent these elements in the filter. Questions like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
