Huffman-Bucket Sketch: A Simple $O(m)$ Algorithm for Cardinality Estimation
Matti Karppa

TL;DR
The paper presents Huffman-Bucket Sketch, a space-efficient, mergeable data structure for cardinality estimation that reduces memory usage and maintains fast updates, improving upon HyperLogLog with theoretical and practical benefits.
Contribution
Introducing Huffman-Bucket Sketch, a new lossless compression method for HyperLogLog that achieves optimal space and constant-time updates while preserving mergeability.
Findings
Achieves optimal space complexity of O(m + log n) bits.
Rebuilds Huffman tree only O(log n) times during streaming.
Preliminary results suggest practical efficiency and competitiveness.
Abstract
We introduce the Huffman-Bucket Sketch (HBS), a simple, mergeable data structure that losslessly compresses a HyperLogLog (HLL) sketch with registers to optimal space bits, with amortized constant-time updates, acting as a drop-in replacement for HLL that retains mergeability and substantially reduces memory requirements. We partition registers into small buckets and encode their values with a global Huffman codebook derived from the strongly concentrated HLL rank distribution, using the current cardinality estimate for determining the mode of the distribution. We prove that the Huffman tree needs rebuilding only times over a stream, roughly when cardinality doubles. The framework can be extended to other sketches with similar strongly concentrated distributions. We provide preliminary numerical evidence that suggests that HBS is practical and can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Data Management and Algorithms
