Combinatorial Sparse PCA Beyond the Spiked Identity Model

Syamantak Kumar; Purnamrita Sarkar; Kevin Tian; Peiyuan Zhang

arXiv:2603.02607·stat.ML·March 4, 2026

Combinatorial Sparse PCA Beyond the Spiked Identity Model

Syamantak Kumar, Purnamrita Sarkar, Kevin Tian, Peiyuan Zhang

PDF

Open Access

TL;DR

This paper introduces a new combinatorial algorithm for sparse PCA that works beyond the traditional spiked identity model, providing theoretical guarantees and practical evaluation on real data.

Contribution

It presents the first combinatorial method with provable success for general covariance matrices in sparse PCA, extending beyond the spiked identity model.

Findings

01

Counterexamples show limitations of existing combinatorial algorithms

02

New combinatorial method with global convergence guarantees

03

Method performs well on synthetic and real-world datasets

Abstract

Sparse PCA is one of the most well-studied problems in high-dimensional statistics. In this problem, we are given samples from a distribution with covariance $Σ$ , whose top eigenvector $v \in R^{d}$ is $s$ -sparse. Existing sparse PCA algorithms can be broadly categorized into (1) combinatorial algorithms (e.g., diagonal or elementwise covariance thresholding) and (2) SDP-based algorithms. While combinatorial algorithms are much simpler, they are typically only analyzed under the spiked identity model (where $Σ = I_{d} + γ v v^{⊤}$ for some $γ > 0$ ), whereas SDP-based algorithms require no additional assumptions on $Σ$ . We demonstrate explicit counterexample covariances $Σ$ against the success of standard combinatorial algorithms for sparse PCA, when moving beyond the spiked identity model. In light of this discrepancy, we give the first combinatorial method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Machine Learning and Algorithms