Minimax bounds for sparse PCA with noisy high-dimensional data

Aharon Birnbaum; Iain M. Johnstone; Boaz Nadler; Debashis Paul

arXiv:1203.0967·math.ST·March 6, 2012·5 cites

Minimax bounds for sparse PCA with noisy high-dimensional data

Aharon Birnbaum, Iain M. Johnstone, Boaz Nadler, Debashis Paul

PDF

Open Access

TL;DR

This paper establishes fundamental limits on estimating sparse leading eigenvectors in high-dimensional noisy data and introduces a new two-stage coordinate selection method for improved estimation.

Contribution

It provides the first minimax risk lower bounds for sparse PCA with noisy high-dimensional data and proposes a novel estimation approach.

Findings

01

Lower bounds reveal different sparsity regimes affecting estimation difficulty.

02

The proposed method outperforms existing techniques in certain regimes.

03

Results highlight the importance of sparsity structure in high-dimensional PCA.

Abstract

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the $l_{2}$ loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Statistical Methods and Inference · Bayesian Methods and Mixture Models