Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation

Yong-Shiuan Lee

arXiv:2210.10967·stat.ME·September 4, 2025

Adaptive greedy forward variable selection for linear regression models with incomplete data using multiple imputation

Yong-Shiuan Lee

PDF

Open Access

TL;DR

This paper introduces an adaptive greedy forward variable selection method for linear regression with missing data, leveraging multiple imputation to improve efficiency and accuracy in variable selection.

Contribution

It proposes an innovative adaptive grafting procedure with pooling rules that efficiently select variables in incomplete data scenarios.

Findings

01

High selection accuracy demonstrated in simulations

02

Enhanced computational efficiency over traditional methods

03

Effective real-world data application results

Abstract

Variable selection is crucial for sparse modeling in this age of big data. Missing values are common in data, and make variable selection more complicated. The approach of multiple imputation (MI) results in multiply imputed datasets for missing values, and has been widely applied in various variable selection procedures. However, directly performing variable selection on the whole MI data or bootstrapped MI data may not be worthy in terms of computation cost. To fast identify the active variables in the linear regression model, we propose the adaptive grafting procedure with three pooling rules on MI data. The proposed methods proceed iteratively, which starts from finding the active variables based on the complete case subset and then expand the working data matrix with both the number of active variables and available observations. A comprehensive simulation study shows the selection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference