Model Selection in High-Dimensional Linear Regression using Boosting with Multiple Testing

George Kapetanios; Vasilis Sarafidis; Alexia Ventouri

arXiv:2602.19705·econ.EM·February 24, 2026

Model Selection in High-Dimensional Linear Regression using Boosting with Multiple Testing

George Kapetanios, Vasilis Sarafidis, Alexia Ventouri

PDF

Open Access

TL;DR

This paper introduces Boosting with Multiple Testing (BMT), a new method for high-dimensional linear regression that combines variable selection with multiple testing to improve model accuracy and interpretability.

Contribution

The paper proposes BMT, a novel approach that integrates boosting and multiple testing, providing oracle properties and improved model selection in high-dimensional settings.

Findings

01

BMT achieves oracle properties and consistent model selection.

02

BMT outperforms Lasso and OCMT in accuracy and RMSE.

03

Empirical results show BMT yields sparse, interpretable models with good out-of-sample performance.

Abstract

High-dimensional regression specification and analysis is a complex and active area of research in statistics, machine learning, and econometrics. This paper proposes a new approach, Boosting with Multiple Testing (BMT), which combines forward stepwise variable selection with the multiple testing framework of Chudik et al (2018). At each stage, the model is updated by adding only the most significant regressor conditional on those already included, while a family-wise multiple testing filter is applied to the remaining candidates. In this way, the method retains the strong screening properties of Chudik et al (2018) while operating in a less greedy manner with respect to proxy and noise variables. Using sharp probability inequalities for heterogeneous strongly mixing processes from Dendramis et al (2022), we show that BMT enjoys oracle type properties relative to an approximating model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Imbalanced Data Classification Techniques