PaCHash: Packed and Compressed Hash Tables
Florian Kurpicz, Hans-Peter Lehmann, Peter Sanders

TL;DR
PaCHash is a static hash table that efficiently stores variable-sized, compressed objects contiguously, achieving low space consumption and fast access with minimal internal memory and disk operations.
Contribution
It introduces a novel static external hash table design that compresses and packs variable-sized objects with constant expected access time and low internal memory requirements.
Findings
Lower space consumption than previous methods.
Requires only one disk access per search.
Uses about 5 bits of internal memory per block.
Abstract
We introduce PaCHash, a hash table that stores its objects contiguously in an array without intervening space, even if the objects have variable size. In particular, each object can be compressed using standard compression techniques. A small search data structure allows locating the objects in constant expected time. PaCHash is most naturally described as a static external hash table where it needs a constant number of bits of internal memory per block of external memory. Here, in some sense, PaCHash beats a lower bound on the space consumption of k-perfect hashing. An implementation for fast SSDs needs about 5 bits of internal memory per block of external memory, requires only one disk access (of variable length) per search operation, and has small internal search overhead compared to the disk access cost. Our experiments show that it has lower space consumption than all previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Caching and Content Delivery
