ShockHash: Near Optimal-Space Minimal Perfect Hashing Beyond Brute-Force
Hans-Peter Lehmann, Peter Sanders, Stefan Walzer

TL;DR
ShockHash introduces a novel approach to minimal perfect hashing using overloaded cuckoo hash tables, significantly reducing construction time and space compared to brute-force methods, and enabling fast, near-optimal MPHF construction for large datasets.
Contribution
The paper presents ShockHash, a new method that improves MPHF construction efficiency and space usage by leveraging pseudoforests and cuckoo hashing, surpassing previous brute-force algorithms.
Findings
Reduces seed trials from e^n to (e/2)^n, halving construction time.
Achieves near-optimal space of 1.489 bits per key for 10 million keys.
Faster MPHF construction within RecSplit framework, up to 1000 times quicker.
Abstract
A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of n*log(e)=1.44n bits needed to represent an MPHF. This can be reached by a brute-force algorithm that tries e^n hash function seeds in expectation and stores the first seed leading to an MPHF. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash tables for minimal perfect hashing. ShockHash uses two hash functions h_0 and h_1, hoping for the existence of a function f : S->{0, 1} such that x -> h_{f(x)}(x) is an MPHF on S. It then uses a 1-bit retrieval data structure to store f using n + o(n) bits. In graph terminology, ShockHash generates n-edge random graphs until stumbling on a pseudoforest - where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Caching and Content Delivery
