Model Selection in High-Dimensional Linear Regression using Boosting with Multiple Testing
George Kapetanios, Vasilis Sarafidis, Alexia Ventouri

TL;DR
This paper introduces Boosting with Multiple Testing (BMT), a new method for high-dimensional linear regression that combines variable selection with multiple testing to improve model accuracy and interpretability.
Contribution
The paper proposes BMT, a novel approach that integrates boosting and multiple testing, providing oracle properties and improved model selection in high-dimensional settings.
Findings
BMT achieves oracle properties and consistent model selection.
BMT outperforms Lasso and OCMT in accuracy and RMSE.
Empirical results show BMT yields sparse, interpretable models with good out-of-sample performance.
Abstract
High-dimensional regression specification and analysis is a complex and active area of research in statistics, machine learning, and econometrics. This paper proposes a new approach, Boosting with Multiple Testing (BMT), which combines forward stepwise variable selection with the multiple testing framework of Chudik et al (2018). At each stage, the model is updated by adding only the most significant regressor conditional on those already included, while a family-wise multiple testing filter is applied to the remaining candidates. In this way, the method retains the strong screening properties of Chudik et al (2018) while operating in a less greedy manner with respect to proxy and noise variables. Using sharp probability inequalities for heterogeneous strongly mixing processes from Dendramis et al (2022), we show that BMT enjoys oracle type properties relative to an approximating model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Imbalanced Data Classification Techniques
