TL;DR
This paper introduces a robust version of best subset selection for sparse regression that resists outliers in data, combining combinatorial optimization with robustness guarantees and demonstrating superior performance on contaminated datasets.
Contribution
It proposes a novel robust subset selection method that generalizes traditional subset selection to handle outliers in both predictors and responses, with formal robustness analysis.
Findings
Robust subsets outperform traditional best subsets under data contamination.
The method achieves high finite-sample breakdown point.
Experimental results show competitive performance with popular robust estimators.
Abstract
The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its desirable statistical properties, the best subsets estimator is susceptible to outliers and can break down in the presence of a single contaminated data point. To address this issue, a robust adaption of best subsets is proposed that is highly resistant to contamination in both the response and the predictors. The adapted estimator generalizes the notion of subset selection to both predictors and observations, thereby achieving robustness in addition to sparsity. This procedure, referred to as "robust subset selection" (or "robust subsets"), is defined by a combinatorial optimization problem for which modern discrete optimization methods are applied.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
