Beating CountSketch for Heavy Hitters in Insertion Streams
Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, David P., Woodruff

TL;DR
This paper introduces a space-efficient algorithm for identifying heavy hitters in data streams, surpassing the traditional CountSketch method by using Gaussian process techniques to reduce space complexity.
Contribution
It presents the first algorithm achieving $O( ext{log} n ext{ log log} n)$ bits of space for heavy hitter detection, improving previous bounds and introducing new methods for $F_2$ and $ ext{l}_ ext{infinity}$ norm estimation.
Findings
Achieves $O( ext{log} n ext{ log log} n)$ bits space for heavy hitters.
Provides the first $F_2$ estimation at all stream points with low space.
Resolves an open problem for $ ext{l}_ ext{infinity}$ norm estimation in insertion streams.
Abstract
Given a stream of items from a universe , which, without loss of generality we identify with the set of integers , we consider the problem of returning all -heavy hitters, i.e., those items for which , where is the number of occurrences of item in the stream, and . Such a guarantee is considerably stronger than the -guarantee, which finds those for which . In 2002, Charikar, Chen, and Farach-Colton suggested the {\sf CountSketch} data structure, which finds all such using bits of space (for constant ). The only known lower bound is bits of space, which comes from the need to specify the identities of the items found. In this paper we show it is possible to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · Data Management and Algorithms
