A Mathematical Programming Approach for Integrated Multiple Linear   Regression Subset Selection and Validation

Seokhyun Chung; Young Woong Park; Taesu Cheong

arXiv:1712.04543·stat.ML·September 4, 2020·Pattern Recognit.

A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation

Seokhyun Chung, Young Woong Park, Taesu Cheong

PDF

TL;DR

This paper introduces an automated mathematical programming method for selecting and validating multiple linear regression subsets, optimizing error minimization while satisfying key regression assumptions, improving over traditional trial-and-error approaches.

Contribution

It presents a novel integrated model that automates subset selection and validation for linear regression using mathematical programming, ensuring better assumption satisfaction.

Findings

01

Outperforms benchmark models in assumption satisfaction

02

Maintains comparable explanatory power

03

Provides alternative subsets when assumptions cannot be fully met

Abstract

Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether the regression assumptions are met. Most traditional approaches require human decisions at this step. For example, the user adding or removing a variable until a satisfactory model is obtained. However, this trial-and-error strategy cannot guarantee that a subset that minimizes the errors while satisfying all regression assumptions will be found. In this paper, we propose a fully automated model building procedure for multiple linear regression subset selection that integrates model building and validation based on mathematical programming. The proposed model minimizes mean squared errors while ensuring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Regression