TL;DR
PHOBIC introduces an optimized method for constructing minimal perfect hash functions that improves space efficiency and construction speed by characterizing optimal bucket size distributions and employing interleaved seed coding, accelerated by GPU implementation.
Contribution
It provides a closed-form solution for optimal bucket size distribution and a novel interleaved encoding scheme, enhancing the efficiency of MPHF construction and query performance.
Findings
PHOBIC achieves 0.17 bits/key space savings over PTHash.
GPU implementation constructs MPHFs at 2.17 bits/key in 28 ns per key.
Query time is 37 ns on CPU for the GPU-accelerated method.
Abstract
A minimal perfect hash function (MPHF) maps a set of n keys to {1, ..., n} without collisions. Such functions find widespread application e.g. in bioinformatics and databases. In this paper we revisit PTHash - a construction technique particularly designed for fast queries. PTHash distributes the input keys into small buckets and, for each bucket, it searches for a hash function seed that places its keys in the output domain without collisions. The collection of all seeds is then stored in a compressed way. Since the first buckets are easier to place, buckets are considered in non-increasing order of size. Additionally, PTHash heuristically produces an imbalanced distribution of bucket sizes by distributing 60% of the keys into 30% of the buckets. Our main contribution is to characterize, up to lower order terms, an optimal distribution of expected bucket sizes. We arrive at a simple,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
