TL;DR
This paper proves that recursive hash functions for n-grams cannot achieve higher than pairwise independence, and introduces a scalable, efficient hashing method using cyclic polynomials that is experimentally faster.
Contribution
It establishes theoretical limitations of recursive hash families and proposes a practical, scalable hashing approach using cyclic polynomials with empirical performance benefits.
Findings
Recursive hash families are at most pairwise independent.
Hashing by cyclic polynomials is twice as fast as using irreducible polynomials.
Randomized Karp-Rabin hashes are not pairwise independent.
Abstract
Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time O(n) or use an exponential amount of memory. As a more scalable alternative, we make hashing by cyclic polynomials pairwise independent by ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials is is twice as fast as hashing by irreducible polynomials. We also show that randomized Karp-Rabin hash families are not pairwise independent.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
