Dimensionality Reduction of Massive Sparse Datasets Using Coresets
Dan Feldman, Mikhail Volkov, Daniela Rus

TL;DR
This paper introduces a practical coreset-based method with performance guarantees for dimensionality reduction of large sparse datasets, enabling efficient low-rank approximations like reduced SVD.
Contribution
It presents a novel deterministic coreset construction technique that is independent of the input matrix and applicable to massive sparse datasets.
Findings
Efficient computation of low-rank approximations for large sparse matrices.
Coreset size is independent of the input matrix, depending only on rank and error parameters.
Demonstrated practical application on Wikipedia dataset for reduced SVD.
Abstract
In this paper we present a practical solution with performance guarantees to the problem of dimensionality reduction for very large scale sparse matrices. We show applications of our approach to computing the low rank approximation (reduced SVD) of such matrices. Our solution uses coresets, which is a subset of scaled rows from the input matrix, that approximates the sub of squared distances from its rows to every -dimensional subspace in , up to a factor of . An open theoretical problem has been whether we can compute such a coreset that is independent of the input matrix and also a weighted subset of its rows. %An open practical problem has been whether we can compute a non-trivial approximation to the reduced SVD of very large databases such as the Wikipedia document-term matrix in a reasonable time. We answer this question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
