# Subset Selection for Multiple Linear Regression via Optimization

**Authors:** Young Woong Park, Diego Klabjan

arXiv: 1701.07920 · 2020-09-04

## TL;DR

This paper develops mathematical programming models for subset selection in multiple linear regression, introducing algorithms that efficiently find high-quality solutions and compare favorably with existing methods.

## Contribution

It presents new optimization models and heuristic algorithms for subset selection, including a randomized method with proven convergence to the global optimum.

## Key findings

- Models quickly find quality solutions
- Iterative algorithms are computationally competitive
- Ad-hoc big M values are not recommended

## Abstract

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy-maximal-relevance criteria. The proposed models are tested using a linear-program-based branch-and-bound algorithm with tailored valid inequalities and big M values and are compared against the algorithms in the literature. For high dimensional cases, an iterative heuristic algorithm is proposed based on the mathematical programming models and a core set concept, and a randomized version of the algorithm is derived to guarantee convergence to the global optimum. From the computational experiments, we find that our models quickly find a quality solution while the rest of the time is spent to prove optimality; the iterative algorithms find solutions in a relatively short time and are competitive compared to state-of-the-art algorithms; using ad-hoc big M values is not recommended.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.07920/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1701.07920/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1701.07920/full.md

---
Source: https://tomesphere.com/paper/1701.07920