Simultaneous Feature Selection and Outlier Detection with Optimality Guarantees
Luca Insolia, Ana Kenney, Francesca Chiaromonte, and Giovanni Felici

TL;DR
This paper introduces a novel mixed-integer programming framework for high-dimensional regression that simultaneously performs feature selection and outlier detection with strong theoretical guarantees and demonstrated superior performance.
Contribution
It presents a new approach combining mixed-integer programming with theoretical analysis for robust feature selection and outlier detection in high-dimensional data.
Findings
Proves conditions for the robustly strong oracle property.
Achieves optimal parameter estimation and high breakdown point.
Outperforms existing heuristic methods in simulations and real data.
Abstract
Sparse estimation methods capable of tolerating outliers have been broadly investigated in the last decade. We contribute to this research considering high-dimensional regression problems contaminated by multiple mean-shift outliers which affect both the response and the design matrix. We develop a general framework for this class of problems and propose the use of mixed-integer programming to simultaneously perform feature selection and outlier detection with provably optimal guarantees. We characterize the theoretical properties of our approach, i.e. a necessary and sufficient condition for the robustly strong oracle property, which allows the number of features to exponentially increase with the sample size; the optimal estimation of the parameters; and the breakdown point of the resulting estimates. Moreover, we provide computationally efficient procedures to tune integer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
