Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

Seth Pettie; Dingyu Wang

arXiv:2208.10578·cs.DS·August 24, 2022

Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

Seth Pettie, Dingyu Wang

PDF

Open Access

TL;DR

This paper introduces a new class of estimators called GRA{} for cardinality estimation in sketching algorithms, which are simpler to compute, more accurate, and nearly optimal compared to existing methods like HyperLogLog and PCSA.

Contribution

The paper defines GRA{} estimators, analyzes their variance, and demonstrates that fractional parameter choices significantly improve accuracy over standard estimators.

Findings

01

GRA{} estimators closely approach Cramér-Rao bounds.

02

Fractional au values improve estimator accuracy.

03

GRA{} estimators are simple to compute and update.

Abstract

\emph{Cardinality Estimation} (aka \emph{Distinct Elements}) is a classic problem in sketching with many industrial applications. Although sketching \emph{algorithms} are fairly simple, analyzing the cardinality \emph{estimators} is notoriously difficult, and even today the state-of-the-art sketches such as HyperLogLog and (compressed) \PCSA{} are not covered in graduate level Big Data courses. In this paper we define a class of \emph{generalized remaining area} (\tGRA) estimators, and observe that HyperLogLog, LogLog, and some estimators for PCSA are merely instantiations of \tGRA{} for various integral values of $τ$ . We then analyze the limiting relative variance of \tGRA{} estimators. It turns out that the standard estimators for HyperLogLog and PCSA can be improved by choosing a \emph{fractional} value of $τ$ . The resulting estimators come \emph{very} close to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Machine Learning and Data Classification · Interactive and Immersive Displays