A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation
Seokhyun Chung, Young Woong Park, Taesu Cheong

TL;DR
This paper introduces an automated mathematical programming method for selecting and validating multiple linear regression subsets, optimizing error minimization while satisfying key regression assumptions, improving over traditional trial-and-error approaches.
Contribution
It presents a novel integrated model that automates subset selection and validation for linear regression using mathematical programming, ensuring better assumption satisfaction.
Findings
Outperforms benchmark models in assumption satisfaction
Maintains comparable explanatory power
Provides alternative subsets when assumptions cannot be fully met
Abstract
Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether the regression assumptions are met. Most traditional approaches require human decisions at this step. For example, the user adding or removing a variable until a satisfactory model is obtained. However, this trial-and-error strategy cannot guarantee that a subset that minimizes the errors while satisfying all regression assumptions will be found. In this paper, we propose a fully automated model building procedure for multiple linear regression subset selection that integrates model building and validation based on mathematical programming. The proposed model minimizes mean squared errors while ensuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Regression
