Simple Set Sketching
Jakob B{\ae}k Tejs Houen, Rasmus Pagh, Stefan Walzer

TL;DR
This paper introduces a simple yet effective set sketching method based on repeated collision resolution in hash tables, enabling near-perfect recovery of inserted keys below a certain load factor, with a complex analysis inspired by invertible Bloom filters.
Contribution
It presents a novel collision resolution technique using repeated hashing and quotienting, allowing linear-time key recovery below a specific load threshold.
Findings
Recovery is possible with high probability if load factor is below 0.81.
The approach extends the invertible Bloom filter concept with implicit checksums.
Analysis shows the method's effectiveness despite the simple description.
Abstract
Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For keys and buckets, where is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions. We show that if this collision resolution strategy is repeated three times independently the situation reverses: If is below a threshold of then we can recover the set of all inserted keys in linear time with high probability. Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · DNA and Biological Computing · Algorithms and Data Compression
