Fast hashing with Strong Concentration Bounds
Anders Aamand, Jakob B. T. Knudsen, Mathias B. T. Knudsen, Peter M. R., Rasmussen, Mikkel Thorup

TL;DR
This paper introduces a new hashing technique called tabulation-permutation hashing that provides strong concentration bounds for hash-based sums, even when expectations exceed previous limits, improving analysis of simple tabulation hashing.
Contribution
The paper develops a new analysis and a novel hashing scheme, tabulation-permutation hashing, that achieves Chernoff-style concentration bounds beyond previous restrictions.
Findings
Tabulation-permutation hashing is at most twice as slow as simple tabulation.
It offers Chernoff-style concentration bounds for expectations larger than previous limits.
The new method improves analysis of hash-based sums in large data sets.
Abstract
Previous work on tabulation hashing by Patrascu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, but under some quite severe restrictions on the expected values of these sums. The basic idea in tabulation hashing is to view a key as consisting of characters, e.g., a 64-bit key as characters of 8-bits. The character domain should be small enough that character tables of size fit in fast cache. The schemes then use tables of this size, so the space of tabulation hashing is . However, the concentration bounds by Patrascu and Thorup only apply if the expected sums are . To see the problem, consider the very simple case where we use tabulation hashing to throw balls…
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
