An ensemble learning method for variable selection: application to high   dimensional data and missing values

Avner Bar-Hen; Vincent Audigier

arXiv:1808.06952·stat.ME·June 9, 2021

An ensemble learning method for variable selection: application to high dimensional data and missing values

Avner Bar-Hen, Vincent Audigier

PDF

TL;DR

This paper introduces an ensemble learning-based variable selection method tailored for high-dimensional and incomplete data, demonstrating improved accuracy and error control over existing methods through simulations and real data applications.

Contribution

It proposes a novel ensemble variable selection approach that effectively handles high-dimensional data with missing values, extending classical methods and providing theoretical and empirical validation.

Findings

01

Improves error risk control, especially type I error, in low-dimensional settings.

02

Performs better than multiple imputation-based methods with missing data.

03

Achieves similar high-dimensional performance with or without missing data.

Abstract

Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like complete case analysis or single imputation, while methods dedicated to missing values, mainly based on multiple imputation, do not discuss the imputation method to use with high-dimensional data. Consequently, both approaches appear to be limited for many modern applications. With inspiration from ensemble methods, a new variable selection method is proposed. It extends classical variable selection methods in the case of high-dimensional data with or without missing data. Theoretical properties are studied and the practical interest is demonstrated through a simulation study, as well as through an application to models specification in sequential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.