Scalable Penalized Regression for Noise Detection in Learning with Noisy   Labels

Yikai Wang; Xinwei Sun; and Yanwei Fu

arXiv:2203.07788·cs.LG·March 22, 2022

Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels

Yikai Wang, Xinwei Sun, and Yanwei Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable penalized regression framework for detecting and removing noisy labels in training data, improving neural network robustness and generalization, especially on large and complex datasets.

Contribution

The paper proposes a novel scalable penalized regression method with a split algorithm for noisy label detection, supported by theoretical guarantees and combined with semi-supervised learning.

Findings

01

Effective noisy label detection on benchmark datasets

02

Improved neural network robustness and accuracy

03

Scalable to large datasets with many categories

Abstract

Noisy training set usually leads to the degradation of generalization and robustness of neural networks. In this paper, we propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL). Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels, where the noisy data are identified by the non-zero mean shift parameters solved in the regression model. To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces that can be solved by the penalized regression in parallel, leading to the Scalable Penalized Regression (SPR) framework. We provide the non-asymptotic probabilistic condition for SPR to correctly identify the noisy data.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yikai-wang/spr-lnl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification