Huffman-Bucket Sketch: A Simple $O(m)$ Algorithm for Cardinality Estimation

Matti Karppa

arXiv:2603.10930·cs.DS·March 12, 2026

Huffman-Bucket Sketch: A Simple $O(m)$ Algorithm for Cardinality Estimation

Matti Karppa

PDF

Open Access

TL;DR

The paper presents Huffman-Bucket Sketch, a space-efficient, mergeable data structure for cardinality estimation that reduces memory usage and maintains fast updates, improving upon HyperLogLog with theoretical and practical benefits.

Contribution

Introducing Huffman-Bucket Sketch, a new lossless compression method for HyperLogLog that achieves optimal space and constant-time updates while preserving mergeability.

Findings

01

Achieves optimal space complexity of O(m + log n) bits.

02

Rebuilds Huffman tree only O(log n) times during streaming.

03

Preliminary results suggest practical efficiency and competitiveness.

Abstract

We introduce the Huffman-Bucket Sketch (HBS), a simple, mergeable data structure that losslessly compresses a HyperLogLog (HLL) sketch with $m$ registers to optimal space $O (m + lo g n)$ bits, with amortized constant-time updates, acting as a drop-in replacement for HLL that retains mergeability and substantially reduces memory requirements. We partition registers into small buckets and encode their values with a global Huffman codebook derived from the strongly concentrated HLL rank distribution, using the current cardinality estimate for determining the mode of the distribution. We prove that the Huffman tree needs rebuilding only $O (lo g n)$ times over a stream, roughly when cardinality doubles. The framework can be extended to other sketches with similar strongly concentrated distributions. We provide preliminary numerical evidence that suggests that HBS is practical and can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Database Systems and Queries · Data Management and Algorithms