Near-Optimal Entrywise Sampling for Data Matrices

Dimitris Achlioptas; Zohar Karnin; Edo Liberty

arXiv:1311.4643·cs.LG·November 20, 2013·19 cites

Near-Optimal Entrywise Sampling for Data Matrices

Dimitris Achlioptas, Zohar Karnin, Edo Liberty

PDF

Open Access

TL;DR

This paper introduces simple, efficient, and provably near-optimal sampling methods for creating sparse matrix sketches that minimize spectral norm error, suitable for streaming data and highly compressible.

Contribution

It presents closed-form, computationally simple sampling distributions for matrix sketching that are nearly optimal and stream-friendly, improving over complex offline solutions.

Findings

01

Sampling distributions are computationally simple and have closed forms.

02

Methods work efficiently in streaming settings with O(1) per non-zero.

03

Resulting sketches are sparse and highly compressible.

Abstract

We consider the problem of selecting non-zero entries of a matrix $A$ in order to produce a sparse sketch of it, $B$ , that minimizes $∥ A - B ∥_{2}$ . For large $m \times n$ matrices, such that $n ≫ m$ (for example, representing $n$ observations over $m$ attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding $A$ . Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with $O (1)$ computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Sparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques