A data driven trimming procedure for robust classification

Marina Antol\'in; Eustasio Del Barrio; Jean-Michel Loubes (IMT)

arXiv:1701.05065·math.ST·January 19, 2017·2 cites

A data driven trimming procedure for robust classification

Marina Antol\'in, Eustasio Del Barrio, Jean-Michel Loubes (IMT)

PDF

Open Access

TL;DR

This paper introduces a data-driven trimming method to develop robust classification rules that maintain high performance on most data points, even with outliers or disturbing observations in the training set.

Contribution

It proposes an automatic trimming procedure that simplifies classification rules and provides theoretical bounds on error rates for the trimmed data.

Findings

01

Effective trimming improves robustness against outliers.

02

Automatic determination of trimming proportion enhances practical applicability.

03

The method guarantees performance on a significant data subset.

Abstract

Classification rules can be severely affected by the presence of disturbing observations in the training sample. Looking for an optimal classifier with such data may lead to unnecessarily complex rules. So, simpler effective classification rules could be achieved if we relax the goal of fitting a good rule for the whole training sample but only consider a fraction of the data. In this paper we introduce a new method based on trimming to produce classification rules with guaranteed performance on a significant fraction of the data. In particular, we provide an automatic way of determining the right trimming proportion and obtain in this setting oracle bounds for the classification error on the new data set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Control Systems and Identification