Sparse principal component analysis and iterative thresholding

Zongming Ma

arXiv:1112.2432·math.ST·May 27, 2013

Sparse principal component analysis and iterative thresholding

Zongming Ma

PDF

TL;DR

This paper introduces an iterative thresholding method for sparse PCA that effectively estimates principal subspaces in high-dimensional settings where traditional PCA struggles, demonstrating theoretical optimality and competitive performance.

Contribution

It proposes a novel iterative thresholding algorithm for sparse PCA that achieves consistent and optimal recovery of principal components in high-dimensional sparse data.

Findings

01

Consistent recovery of principal subspaces under a spiked covariance model

02

Optimal performance in high-dimensional sparse settings

03

Competitive results demonstrated through simulations

Abstract

Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.