A statistical analysis of probabilistic counting algorithms
Peter Clifford, Ioana A. Cosma

TL;DR
This paper provides a statistical analysis of probabilistic counting algorithms for cardinality estimation in data streams, comparing methods based on order statistics and random projections, and deriving efficient estimators with strong error bounds.
Contribution
It introduces a unified statistical framework for analyzing and comparing two prominent probabilistic counting techniques, revealing their asymptotic efficiency and a surprising connection.
Findings
Maximal-term estimator is recursively computable.
Estimators have exponentially decreasing error bounds.
Both approaches have comparable asymptotic efficiency.
Abstract
This paper considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form low-dimensional data sketches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal-term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Stream Mining Techniques
