A variable selection approach for highly correlated predictors in high-dimensional genomic data
Wencan Zhu, C\'eline L\'evy-Leduc, Nils Tern\`es

TL;DR
This paper introduces WLasso, a novel variable selection method for high-dimensional genomic data that effectively handles highly correlated predictors, outperforming existing methods in simulations and real breast cancer data.
Contribution
The paper proposes WLasso, a new approach that accounts for predictor correlations in high-dimensional linear models, improving variable selection accuracy in genomic studies.
Findings
WLasso outperforms existing methods in simulated highly correlated data scenarios.
WLasso successfully identifies relevant biomarkers in breast cancer gene expression data.
The method is implemented in an accessible R package.
Abstract
In genomic studies, identifying biomarkers associated with a variable of interest is a major concern in biomedical research. Regularized approaches are classically used to perform variable selection in high-dimensional linear models. However, these methods can fail in highly correlated settings. We propose a novel variable selection approach called WLasso, taking these correlations into account. It consists in rewriting the initial high-dimensional linear model to remove the correlation between the biomarkers (predictors) and in applying the generalized Lasso criterion. The performance of WLasso is assessed using synthetic data in several scenarios and compared with recent alternative approaches. The results show that when the biomarkers are highly correlated, WLasso outperforms the other approaches in sparse high-dimensional frameworks. The method is also successfully illustrated on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Genetic and phenotypic traits in livestock
