On Sketching Trimmed Statistics
Honghao Lin, Hoai-An Nguyen, David P. Woodruff

TL;DR
This paper introduces space-efficient linear sketching methods for estimating trimmed statistics of frequency vectors, enabling robust, fast, and memory-efficient analysis in streaming and distributed data settings.
Contribution
It provides new conditions and algorithms for sketching trimmed frequency statistics, including optimal error guarantees and extensions to various related problems.
Findings
Achieves poly(1/ε, log n) space for approximating top-k Fp moments when k ≥ n/polylog n.
Establishes necessary conditions relating the k-th largest frequency to tail mass for general k.
Empirically demonstrates reduced space usage compared to Count-Sketch with comparable accuracy.
Abstract
We present space-efficient linear sketches for estimating trimmed statistics of an -dimensional frequency vector , e.g., the sum of -th powers of the largest frequencies (i.e., entries) in absolute value, or the -trimmed vector, which excludes the top and bottom frequencies. This is called the moment of the trimmed vector. Trimmed measures are used in robust estimation, as seen in the R programming language's `trim.var' function and the `trim' parameter in the mean function. Linear sketches improve time and memory efficiency and are applicable to streaming and distributed settings. We initiate the study of sketching these statistics and give a new condition for capturing their space complexity. When , we give a linear sketch using space which provides a approximation to the top- …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Stochastic Gradient Optimization Techniques · Complexity and Algorithms in Graphs
