TL;DR
This paper unifies eight formulations of sparse PCA using an alternating maximization framework, introduces three novel formulations, and provides highly efficient parallel implementations for large-scale data analysis.
Contribution
It presents a unifying reformulation for multiple sparse PCA models, establishes their equivalence to GPower, and develops 24 parallel algorithms including GPU and cluster implementations.
Findings
GPU implementation is up to 100 times faster than serial code.
Cluster code can solve a 357 GB problem in about a minute.
Three new formulations of sparse PCA are introduced and analyzed.
Abstract
Given a multivariate data set, sparse principal component analysis (SPCA) aims to extract several linear combinations of the variables that together explain the variance in the data as much as possible, while controlling the number of nonzero loadings in these combinations. In this paper we consider 8 different optimization formulations for computing a single sparse loading vector; these are obtained by combining the following factors: we employ two norms for measuring variance (L2, L1) and two sparsity-inducing norms (L0, L1), which are used in two different ways (constraint, penalty). Three of our formulations, notably the one with L0 constraint and L1 variance, have not been considered in the literature. We give a unifying reformulation which we propose to solve via a natural alternating maximization (AM) method. We show the the AM method is nontrivially equivalent to GPower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
