A Sparse SVD Method for High-dimensional Data
Dan Yang, Zongming Ma, Andreas Buja

TL;DR
This paper introduces a new sparse SVD algorithm for high-dimensional, noisy data that is faster and often more effective than existing methods, especially when the underlying signal is sparse.
Contribution
The authors develop a novel sparse SVD approach using thresholded subspace iterations with automatic parameter estimation, outperforming existing methods in speed and comparable in accuracy.
Findings
Algorithm is computationally faster than existing methods.
Performs at least as well statistically as the best competing algorithms.
Effective in extracting sparse signals from high-dimensional noisy data.
Abstract
We present a new computational approach to approximating a large, noisy data table by a low-rank matrix with sparse singular vectors. The approximation is obtained from thresholded subspace iterations that produce the singular vectors simultaneously, rather than successively as in competing proposals. We introduce novel ways to estimate thresholding parameters which obviate the need for computationally expensive cross-validation. We also introduce a way to sparsely initialize the algorithm for computational savings that allow our algorithm to outperform the vanilla SVD on the full data table when the signal is sparse. A comparison with two existing sparse SVD methods suggests that our algorithm is computationally always faster and statistically always at least comparable to the better of the two competing algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Blind Source Separation Techniques
