Daisy Bloom Filters

Ioana O. Bercea; Jakob B{\ae}k Tejs Houen; and Rasmus Pagh

arXiv:2205.14894·cs.DS·June 18, 2024

Daisy Bloom Filters

Ioana O. Bercea, Jakob B{\ae}k Tejs Houen, and Rasmus Pagh

PDF

TL;DR

This paper introduces the Daisy Bloom filter, a new data structure that improves space efficiency and operational speed over traditional Bloom filters, especially for data with known distribution properties.

Contribution

The paper provides a lower bound on the expected space for distribution-aware Bloom filters and presents a new Daisy Bloom filter that is faster and more space-efficient.

Findings

01

Daisy Bloom filters outperform standard Bloom filters in space usage.

02

The proposed filter achieves worst-case constant time operations.

03

The filter maintains a low false positive rate with high probability.

Abstract

A filter is a widely used data structure for storing an approximation of a given set $S$ of elements from some universe $U$ (a countable set).It represents a superset $S^{'} \supseteq S$ that is ''close to $S$ '' in the sense that for $x \neq \in S$ , the probability that $x \in S^{'}$ is bounded by some $ε > 0$ . The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store $S$ exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in $S$ with probability close to 1. Then it would make sense to always include them in $S^{'}$ , saving space by not having to represent these elements in the filter. Questions like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.