TL;DR
This paper introduces rotation fitting and parallel processing techniques to significantly accelerate the construction of RecSplit-based minimal perfect hash functions, achieving up to thousands of times faster speeds and improved energy efficiency.
Contribution
It presents rotation fitting as a novel technique and leverages parallelism on multiple hardware levels to enhance RecSplit's construction speed and efficiency.
Findings
Speedup of up to 239x on 8-core CPU
Speedup of up to 5438x using GPU
Construction time reduced from 1.5 hours to 5 seconds for 5 million objects
Abstract
A minimal perfect hash function (MPHF) bijectively maps a set S of objects to the first |S| integers. It can be used as a building block in databases and data compression. RecSplit [Esposito et al., ALENEX'20] is currently the most space efficient practical minimal perfect hash function. It heavily relies on trying out hash functions in a brute force way. We introduce rotation fitting, a new technique that makes the search more efficient by drastically reducing the number of tried hash functions. Additionally, we greatly improve the construction time of RecSplit by harnessing parallelism on the level of bits, vectors, cores, and GPUs. In combination, the resulting improvements yield speedups up to 239 on an 8-core CPU and up to 5438 using a GPU. The original single-threaded RecSplit implementation needs 1.5 hours to construct an MPHF for 5 Million objects with 1.56 bits per object. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
