Probabilistic behavior of hash tables
Dawei Hong, Jean-Camille Birget, Shushuang Man

TL;DR
This paper extends previous work on hash function collision probabilities, demonstrating that under certain load conditions, the estimator's tail behavior is Gaussian, which improves understanding of hash table performance.
Contribution
It proves that the collision probability estimator has a Gaussian tail when the load factor exceeds a certain threshold, extending prior polynomial tail results.
Findings
Estimator exhibits Gaussian tail under high load
Provides upper bound for average search time in hashing with chaining
Applicable to user-specific key distributions
Abstract
We extend a result of Goldreich and Ron about estimating the collision probability of a hash function. Their estimate has a polynomial tail. We prove that when the load factor is greater than a certain constant, the estimator has a gaussian tail. As an application we find an estimate of an upper bound for the average search time in hashing with chaining, for a particular user (we allow the overall key distribution to be different from the key distribution of a particular user). The estimator has a gaussian tail.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Data Management and Algorithms · Algorithms and Data Compression
