Support Recovery in Sparse PCA with Incomplete Data

Hanbyul Lee; Qifan Song; Jean Honorio

arXiv:2205.15215·stat.ML·September 16, 2022

Support Recovery in Sparse PCA with Incomplete Data

Hanbyul Lee, Qifan Song, Jean Honorio

PDF

Open Access 1 Video

TL;DR

This paper presents a semidefinite programming approach for sparse PCA that can accurately recover the true support of the leading eigenvector from incomplete and noisy data, supported by theoretical guarantees and experimental validation.

Contribution

The paper introduces a novel SDP-based algorithm for sparse PCA with incomplete data, providing theoretical conditions for exact support recovery and demonstrating its effectiveness through experiments.

Findings

01

Exact support recovery under certain conditions

02

Effective on synthetic and real gene expression data

03

Theoretical guarantees for recovery performance

Abstract

We study a practical algorithm for sparse principal component analysis (PCA) of incomplete and noisy data. Our algorithm is based on the semidefinite program (SDP) relaxation of the non-convex $l_{1}$ -regularized PCA problem. We provide theoretical and experimental evidence that SDP enables us to exactly recover the true support of the sparse leading eigenvector of the unknown true matrix, despite only observing an incomplete (missing uniformly at random) and noisy version of it. We derive sufficient conditions for exact recovery, which involve matrix incoherence, the spectral gap between the largest and second-largest eigenvalues, the observation probability and the noise variance. We validate our theoretical results with incomplete synthetic data, and show encouraging and meaningful results on a gene expression dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Support Recovery in Sparse PCA with Incomplete Data· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Blind Source Separation Techniques · Geochemistry and Geologic Mapping

MethodsPrincipal Components Analysis