TL;DR
This paper explores the fundamental trade-offs between statistical accuracy and computational efficiency in estimating sparse principal components, revealing limitations of polynomial-time algorithms under certain conditions.
Contribution
It demonstrates a computational-statistical gap in sparse PCA estimation and analyzes the performance of a polynomial-time semidefinite relaxation method.
Findings
No polynomial-time algorithm achieves minimax optimal rate in certain regimes.
A new class with restricted covariance concentration is introduced.
Semidefinite relaxation exhibits a trade-off between statistical and computational performance.
Abstract
In recent years, sparse principal component analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Statistical and computational trade-offs in estimation of sparse principal components.· youtube
