Comparison study of variable selection procedures in high-dimensional Gaussian linear regression
Perrine Lacroix, M\'elina Gallopin, Marie-Laure Martin

TL;DR
This paper conducts a comprehensive simulation comparison of variable selection methods based on regularization paths in high-dimensional Gaussian linear regression, providing practical recommendations for different performance metrics.
Contribution
It offers an extensive evaluation of various variable selection procedures under diverse settings, highlighting their strengths and limitations in high-dimensional contexts.
Findings
No single method is best across all metrics.
Recommendations vary depending on the performance metric.
Model assumptions like high dimensionality and Gaussianity significantly impact results.
Abstract
We propose an extensive simulation study to compare some variable selection procedures in a high-dimensional framework. Assuming that the relationship between the actives variables and the response variable is linear, the high-dimensional Gaussian linear regression provides a relevant statistical framework to identify active variables related to the response variable. Many variable selection procedures exist, and in this article, we focus on methods based on regularization paths. We perform a comparison study by considering different simulation settings with various dependency structures for variables and evaluate the performance of the methods by computing several metrics. As expected, no method is optimal for all the evaluated performances but we provide recommendations for the best procedures according to the metric to control. Lastly, we test the importance of some assumptions of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Gene expression and cancer classification · Computational Drug Discovery Methods
