Testing significance of features by lassoed principal components
Daniela M. Witten, Robert Tibshirani

TL;DR
This paper introduces Lassoed Principal Components (LPC), a new method for testing feature significance in high-dimensional data, notably improving false discovery rates in gene expression analysis.
Contribution
LPC combines PCA and Lasso penalty to enhance feature significance testing, offering a flexible and theoretically justified improvement over traditional methods.
Findings
LPC reduces false discovery rates compared to standard methods.
LPC effectively identifies significant genes in real and simulated data.
Theoretical framework supports LPC's use for feature significance testing.
Abstract
We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample -statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an penalty in order to de-noise the resulting projections. We present a theoretical framework under which LPC is the logical choice for identifying significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
