Approximately Minwise Independence with Twisted Tabulation
S{\o}ren Dahlgaard, Mikkel Thorup

TL;DR
This paper demonstrates that twisted tabulation hashing achieves near-minwise independence with low computational complexity, improving understanding of its effectiveness for similarity estimation in various set sizes.
Contribution
It proves that twisted tabulation hashing provides /u^{1/c}-minwise independence, a significant improvement over previous results, with a simpler analysis.
Findings
Achieves /u^{1/c}-minwise independence with constant-time lookups.
Requires only O(1) lookups and small space, suitable for large and small sets.
Simplifies previous complex analysis methods for minwise hashing.
Abstract
A random hash function is -minwise if for any set , , and element , . Minwise hash functions with low bias have widespread applications within similarity estimation. Hashing from a universe , the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [SODA'13] makes lookups in tables of size . Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields -minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79] -minwise hashing requires -independence [Indyk SODA'99]. P\v{a}tra\c{s}cu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields -minwise independence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDrug Transport and Resistance Mechanisms · Limits and Structures in Graph Theory
