# On Occupancy Moments and Bloom Filter Efficiency

**Authors:** Jonathan Burns

arXiv: 1908.04810 · 2019-08-15

## TL;DR

This paper derives exact formulas and bounds for occupancy and committee distributions, and applies these to analyze and improve the efficiency and false-positive rate estimation of Bloom filters.

## Contribution

It introduces precise moment formulas for occupancy distributions and corrects Bloom filter analysis, showing efficiency is monotonic with hash functions.

## Key findings

- Exact formulas for false-positive rates are provided.
- Conventional Bloom filter analysis overestimates hash functions needed.
- Bloom filter efficiency increases monotonically with hash functions.

## Abstract

Two multivariate committee distributions are shown to belong to Berg's family of factorial series distributions and Kemp's family of generalized hypergeometric factorial moment distributions. Exact moment formulas, upper and lower bounds, and statistical parameter estimators are provided for the classic occupancy and committee distributions. The derived moment equations are used to determine exact formulas for the false-positive rate and efficiency of Bloom filters -- probabilistic data structures used to solve the set membership problem. This study reveals that the conventional Bloom filter analysis overestimates the number of hash functions required to minimize the false-positive rate, and shows that Bloom filter efficiency is monotonic in the number of hash functions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04810/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1908.04810/full.md

## References

62 references — full list in the complete paper: https://tomesphere.com/paper/1908.04810/full.md

---
Source: https://tomesphere.com/paper/1908.04810