Near-Optimal Entrywise Sampling for Data Matrices
Dimitris Achlioptas, Zohar Karnin, Edo Liberty

TL;DR
This paper introduces simple, efficient, and provably near-optimal sampling methods for creating sparse matrix sketches that minimize spectral norm error, suitable for streaming data and highly compressible.
Contribution
It presents closed-form, computationally simple sampling distributions for matrix sketching that are nearly optimal and stream-friendly, improving over complex offline solutions.
Findings
Sampling distributions are computationally simple and have closed forms.
Methods work efficiently in streaming settings with O(1) per non-zero.
Resulting sketches are sparse and highly compressible.
Abstract
We consider the problem of selecting non-zero entries of a matrix in order to produce a sparse sketch of it, , that minimizes . For large matrices, such that (for example, representing observations over attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding . Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
