UltraLogLog: A Practical and More Space-Efficient Alternative to HyperLogLog for Approximate Distinct Counting
Otmar Ertl

TL;DR
UltraLogLog is a new approximate distinct counting algorithm that improves space efficiency over HyperLogLog while maintaining practical properties, with proven theoretical and experimental validation, and available as open-source software.
Contribution
It introduces UltraLogLog, a space-efficient variant of HyperLogLog, with novel estimators and implementation optimizations for practical use.
Findings
Requires 28% less space than HyperLogLog for same accuracy
Achieves 24% space reduction with a faster estimator
Experimental results confirm theoretical space savings and efficiency
Abstract
Since its invention HyperLogLog has become the standard algorithm for approximate distinct counting. Due to its space efficiency and suitability for distributed systems, it is widely used and also implemented in numerous databases. This work presents UltraLogLog, which shares the same practical properties as HyperLogLog. It is commutative, idempotent, mergeable, and has a fast guaranteed constant-time insert operation. At the same time, it requires 28% less space to encode the same amount of distinct count information, which can be extracted using the maximum likelihood method. Alternatively, a simpler and faster estimator is proposed, which still achieves a space reduction of 24%, but at an estimation speed comparable to that of HyperLogLog. In a non-distributed setting where martingale estimation can be used, UltraLogLog is able to reduce space by 17%. Moreover, its smaller entropy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Advanced Database Systems and Queries
