Penalized Principal Component Analysis Using Smoothing
Rebecca M. Hurwitz, Georg Hahn

TL;DR
This paper introduces a smoothed penalized eigenvalue problem for PCA that enhances stability and efficiency, enabling better sparse eigenvector computation and improved performance in genomic data analysis and clustering.
Contribution
The article extends penalized eigenvalue problems by incorporating smoothing, allowing analytical gradients for faster optimization, and demonstrates its effectiveness in genomic and clustering applications.
Findings
Smoothed PEP improves numerical stability and eigenvector interpretability.
Enhanced prediction accuracy in polygenic risk scores.
Outperforms seven state-of-the-art sparse PCA algorithms in accuracy and runtime.
Abstract
Principal components computed via PCA (principal component analysis) are traditionally used to reduce dimensionality in genomic data or to correct for population stratification. In this paper, we explore the penalized eigenvalue problem (PEP) which reformulates the computation of the first eigenvector as an optimization problem and adds an penalty constraint to enforce sparseness of the solution. The contribution of our article is threefold. First, we extend PEP by applying smoothing to the original LASSO-type penalty. This allows one to compute analytical gradients which enable faster and more efficient minimization of the objective function associated with the optimization problem. Second, we demonstrate how higher order eigenvectors can be calculated with PEP using established results from singular value decomposition (SVD). Third, we present four experimental studies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
MethodsPrincipal Components Analysis
