On the k-Independence Required by Linear Probing and Minwise Independence
Mikkel Thorup

TL;DR
This paper establishes the independence requirements for linear probing and minwise independence, demonstrating that higher independence levels are necessary for expected constant-time performance and approximate minwise independence.
Contribution
It proves that 5-independence is needed for linear probing's efficiency and that \\Omega(log 1/\\epsilon)-independence is required for approximate minwise independence, clarifying theoretical bounds.
Findings
4-independent hash functions lead to logarithmic search time in linear probing.
Higher independence levels are necessary for optimal performance in hashing schemes.
The 2-independent multiply-shift scheme performs poorly in these applications.
Abstract
We show that linear probing requires 5-independent hash functions for expected constant-time performance, matching an upper bound of [Pagh et al. STOC'07]. More precisely, we construct a 4-independent hash functions yielding expected logarithmic search time. For (1+{\epsilon})-approximate minwise independence, we show that \Omega(log 1/{\epsilon})-independent hash functions are required, matching an upper bound of [Indyk, SODA'99]. We also show that the very fast 2-independent multiply-shift scheme of Dietzfelbinger [STACS'96] fails badly in both applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cryptography and Data Security · Algorithms and Data Compression
