Online Row Sampling
Michael B. Cohen, Cameron Musco, Jakub Pachocki

TL;DR
This paper introduces simple online algorithms for row sampling to produce spectral approximations of large matrices with minimal memory, suitable for streaming data, and proves their optimality.
Contribution
It presents novel online algorithms for row sampling that operate with low memory and provide provably optimal spectral approximations of matrices.
Findings
The algorithms achieve spectral approximation with low sample complexity.
Memory usage can be reduced to near the size of the spectral approximation.
The methods reveal new theoretical insights into leverage score based matrix approximation.
Abstract
Finding a small spectral approximation for a tall matrix is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of . Row sampling improves interpretability, saves space when is sparse, and preserves row structure, which is especially important, for example, when represents a graph. However, correctly sampling rows from can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [KL13, KLM+14]. Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of one by one and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
