Improved Concentration Bounds for Count-Sketch
Gregory T. Minton, Eric Price

TL;DR
This paper refines the analysis of Count-Sketch, showing that individual coordinate estimates are more accurate than previously proven, leading to stronger concentration bounds and improved sparse recovery performance.
Contribution
The authors provide a tighter analysis of Count-Sketch, demonstrating that most coordinates have significantly less error, and establish new concentration bounds for set estimates and sparse recovery.
Findings
Most coordinate estimates have error less than previous bounds
New concentration bounds improve understanding of Count-Sketch performance
Empirical results confirm the theoretical bounds and small constants
Abstract
We present a refined analysis of the classic Count-Sketch streaming heavy hitters algorithm [CCF02]. Count-Sketch uses O(k log n) linear measurements of a vector x in R^n to give an estimate x' of x. The standard analysis shows that this estimate x' satisfies ||x'-x||_infty^2 < ||x_tail||_2^2 / k, where x_tail is the vector containing all but the largest k coordinates of x. Our main result is that most of the coordinates of x' have substantially less error than this upper bound; namely, for any c < O(log n), we show that each coordinate i satisfies (x'_i - x_i)^2 < (c/log n) ||x_tail||_2^2/k with probability 1-2^{-Omega(c)}, as long as the hash functions are fully independent. This subsumes the previous bound and is optimal for all c. Using these improved point estimates, we prove a stronger concentration result for set estimates by first analyzing the covariance matrix and then using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
