Streaming Algorithms for High-Dimensional Robust Statistics
Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas

TL;DR
This paper introduces the first efficient streaming algorithms for high-dimensional robust statistics, achieving near-optimal memory usage and error guarantees for tasks like mean estimation, covariance estimation, and regression.
Contribution
It develops the first streaming algorithms with near-linear space complexity for high-dimensional robust estimation tasks, improving over previous methods that required quadratic memory.
Findings
Efficient single-pass streaming algorithm for robust mean estimation with near-optimal error.
Streaming algorithms for robust covariance estimation and regression with near-optimal space complexity.
Achieved near-linear space complexity for multiple high-dimensional robust statistical tasks.
Abstract
We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Markov Chains and Monte Carlo Methods
