LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based   on LogLog Counting

Jason Qin; Denys Kim; Yumei Tung

arXiv:1612.02284·cs.DS·December 22, 2020·2 cites

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

Jason Qin, Denys Kim, Yumei Tung

PDF

Open Access 3 Repos

TL;DR

LogLog-Beta is a simplified, efficient cardinality estimation algorithm based on LogLog counting that requires only one formula and achieves accuracy comparable to or better than HyperLogLog variants.

Contribution

Introduces LogLog-Beta, a new single-formula algorithm for cardinality estimation that improves efficiency and accuracy over existing methods like HyperLogLog.

Findings

01

LogLog-Beta matches or exceeds HyperLogLog accuracy.

02

The algorithm is simpler and more efficient to implement.

03

Provides an additional estimator based on order statistics.

Abstract

The information presented in this paper defines LogLog-Beta. LogLog-Beta is a new algorithm for estimating cardinalities based on LogLog counting. The new algorithm uses only one formula and needs no additional bias corrections for the entire range of cardinalities, therefore, it is more efficient and simpler to implement. Our simulations show that the accuracy provided by the new algorithm is as good as or better than the accuracy provided by either of HyperLogLog or HyperLogLog++. In addition to LogLog-Beta we also provide another one-formula estimator for cardinalities based on order statistics, a modification of an algorithm developed by Lumbroso.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Time Series Analysis and Forecasting