On the Use of Information Criteria for Subset Selection in Least Squares Regression
Sen Tian, Clifford M. Hurvich, Jeffrey S. Simonoff

TL;DR
This paper introduces BOSS, a scalable orthogonalized subset selection method for linear regression, which uses an information criterion with a heuristic degrees of freedom estimate to efficiently select predictors.
Contribution
The paper proposes BOSS, a novel LS-based subset selection method that scales to large datasets and incorporates a heuristic degrees of freedom for information criteria.
Findings
BOSS outperforms existing LS-based methods in simulations and real data.
BOSS is computationally efficient, requiring only a single LS fit.
The proposed AICc-hdf criterion improves subset selection accuracy.
Abstract
Least squares (LS)-based subset selection methods are popular in linear regression modeling. Best subset selection (BS) is known to be NP-hard and has a computational cost that grows exponentially with the number of predictors. Recently, Bertsimas (2016) formulated BS as a mixed integer optimization (MIO) problem and largely reduced the computation overhead by using a well-developed optimization solver, but the current methodology is not scalable to very large datasets. In this paper, we propose a novel LS-based method, the best orthogonalized subset selection (BOSS) method, which performs BS upon an orthogonalized basis of ordered predictors and scales easily to large problem sizes. Another challenge in applying LS-based methods in practice is the selection rule to choose the optimal subset size k. Cross-validation (CV) requires fitting a procedure multiple times, and results in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Statistical Methods and Inference · Advanced Statistical Methods and Models
