An ensemble learning method for variable selection: application to high dimensional data and missing values
Avner Bar-Hen, Vincent Audigier

TL;DR
This paper introduces an ensemble learning-based variable selection method tailored for high-dimensional and incomplete data, demonstrating improved accuracy and error control over existing methods through simulations and real data applications.
Contribution
It proposes a novel ensemble variable selection approach that effectively handles high-dimensional data with missing values, extending classical methods and providing theoretical and empirical validation.
Findings
Improves error risk control, especially type I error, in low-dimensional settings.
Performs better than multiple imputation-based methods with missing data.
Achieves similar high-dimensional performance with or without missing data.
Abstract
Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like complete case analysis or single imputation, while methods dedicated to missing values, mainly based on multiple imputation, do not discuss the imputation method to use with high-dimensional data. Consequently, both approaches appear to be limited for many modern applications. With inspiration from ensemble methods, a new variable selection method is proposed. It extends classical variable selection methods in the case of high-dimensional data with or without missing data. Theoretical properties are studied and the practical interest is demonstrated through a simulation study, as well as through an application to models specification in sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
