Fast and Powerful Hashing using Tabulation
Mikkel Thorup

TL;DR
This paper surveys simple tabulation hashing methods that provide strong probabilistic guarantees, including Chernoff bounds and high independence, with practical speed and ease of implementation.
Contribution
It introduces and analyzes twisted and double tabulation hashing schemes, demonstrating their strong distributional properties and near-optimal independence.
Findings
Simple tabulation hashing is fast and provides many guarantees of higher independence.
Twisted tabulation achieves Chernoff-Hoeffding bounds and small bias for min-wise hashing.
Double tabulation yields high independence, often approaching full randomness.
Abstract
Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of characters and we have precomputed character tables mapping characters to random hash values. A key is hashed to . This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · DNA and Biological Computing
