Learning with Bad Training Data via Iterative Trimmed Loss Minimization

Yanyao Shen; Sujay Sanghavi

arXiv:1810.11874·cs.LG·February 20, 2019·62 cites

Learning with Bad Training Data via Iterative Trimmed Loss Minimization

Yanyao Shen, Sujay Sanghavi

PDF

Open Access

TL;DR

This paper introduces an iterative trimmed loss minimization framework that effectively learns from corrupted training data by focusing on samples with the lowest current loss, demonstrating strong theoretical and empirical results across various settings.

Contribution

It proposes a novel iterative method for robust learning from corrupted data and provides theoretical guarantees of convergence, outperforming existing approaches in multiple scenarios.

Findings

01

Recovers ground truth with linear convergence in generalized linear models.

02

Achieves state-of-the-art results on label noise without prior clean data.

03

Effective against adversarial training data, including backdoor attacks.

Abstract

In this paper, we study a simple and generic framework to tackle the problem of learning model parameters when a fraction of the training samples are corrupted. We first make a simple observation: in a variety of such settings, the evolution of training accuracy (as a function of training epochs) is different for clean and bad samples. Based on this we propose to iteratively minimize the trimmed loss, by alternating between (a) selecting samples with lowest current loss, and (b) retraining a model on only these samples. We prove that this process recovers the ground truth (with linear convergence rate) in generalized linear models with standard statistical assumptions. Experimentally, we demonstrate its effectiveness in three settings: (a) deep image classifiers with errors only in labels, (b) generative adversarial networks with bad training images, and (c) deep image classifiers with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications