A statistical methodology to select covariates in high-dimensional data under dependence. Application to the classification of genetic profiles in oncology
B\'erang\`ere Bastien, Taha Boukhobza (CRAN), H\'el\`ene Dumond, (CRAN), Anne G\'egout-Petit (BIGS, IECL), Aur\'elie Muller-Gueudin (BIGS,, IECL), Charl\`ene Thi\'ebaut (CRAN)

TL;DR
This paper introduces a new statistical methodology for selecting and ranking covariates in high-dimensional, dependent data, demonstrated through applications in oncology to identify relevant genetic markers and patient profiles.
Contribution
The paper presents a novel approach combining clustering, decorrelation, and aggregation techniques for covariate selection in complex high-dimensional datasets with dependence structures.
Findings
Decorrelating covariates improves selection accuracy.
Method successfully identifies relevant genetic covariates in cancer data.
Application reveals new insights into patient-specific genetic profiles.
Abstract
We propose a new methodology for selecting and ranking covariates associated with a variable of interest in a context of high-dimensional data under dependence but few observations. The methodology successively intertwines the clustering of covariates, decorrelation of covariates using Factor Latent Analysis, selection using aggregation of adapted methods and finally ranking. Simulations study shows the interest of the decorrelation inside the different clusters of covariates. We first apply our method to transcriptomic data of 37 patients with advanced non-small-cell lung cancer who have received chemotherapy, to select the transcriptomic covariates that explain the survival outcome of the treatment. Secondly, we apply our method to 79 breast tumor samples to define patient profiles for a new metastatic biomarker and associated gene network in order to personalize the treatments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
