HyperLogLog Hyper Extended: Sketches for Concave Sublinear Frequency Statistics
Edith Cohen

TL;DR
This paper introduces new compact sketches for estimating a broad class of concave sublinear frequency statistics, extending the capabilities of existing distinct counting methods to more complex data aggregation tasks.
Contribution
It presents the first composable, double-logarithmic size sketches for all concave sublinear frequency statistics, combining theoretical optimality with practical simplicity.
Findings
Sketches are of double-logarithmic size.
They effectively approximate various concave sublinear statistics.
The approach is both theoretically sound and practically efficient.
Abstract
One of the most common statistics computed over data elements is the number of distinct keys. A thread of research pioneered by Flajolet and Martin three decades ago culminated in the design of optimal approximate counting sketches, which have size that is double logarithmic in the number of distinct keys and provide estimates with a small relative error. Moreover, the sketches are composable, and thus suitable for streamed, parallel, or distributed computation. We consider here all statistics of the frequency distribution of keys, where a contribution of a key to the aggregate is concave and grows (sub)linearly with its frequency. These fundamental aggregations are very common in text, graphs, and logs analysis and include logarithms, low frequency moments, and capping statistics. We design composable sketches of double-logarithmic size for all concave sublinear statistics. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Data Management and Algorithms · Anomaly Detection Techniques and Applications
