A data driven trimming procedure for robust classification
Marina Antol\'in, Eustasio Del Barrio, Jean-Michel Loubes (IMT)

TL;DR
This paper introduces a data-driven trimming method to develop robust classification rules that maintain high performance on most data points, even with outliers or disturbing observations in the training set.
Contribution
It proposes an automatic trimming procedure that simplifies classification rules and provides theoretical bounds on error rates for the trimmed data.
Findings
Effective trimming improves robustness against outliers.
Automatic determination of trimming proportion enhances practical applicability.
The method guarantees performance on a significant data subset.
Abstract
Classification rules can be severely affected by the presence of disturbing observations in the training sample. Looking for an optimal classifier with such data may lead to unnecessarily complex rules. So, simpler effective classification rules could be achieved if we relax the goal of fitting a good rule for the whole training sample but only consider a fraction of the data. In this paper we introduce a new method based on trimming to produce classification rules with guaranteed performance on a significant fraction of the data. In particular, we provide an automatic way of determining the right trimming proportion and obtain in this setting oracle bounds for the classification error on the new data set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Control Systems and Identification
