Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels
Yikai Wang, Xinwei Sun, and Yanwei Fu

TL;DR
This paper introduces a scalable penalized regression framework for detecting and removing noisy labels in training data, improving neural network robustness and generalization, especially on large and complex datasets.
Contribution
The paper proposes a novel scalable penalized regression method with a split algorithm for noisy label detection, supported by theoretical guarantees and combined with semi-supervised learning.
Findings
Effective noisy label detection on benchmark datasets
Improved neural network robustness and accuracy
Scalable to large datasets with many categories
Abstract
Noisy training set usually leads to the degradation of generalization and robustness of neural networks. In this paper, we propose using a theoretically guaranteed noisy label detection framework to detect and remove noisy data for Learning with Noisy Labels (LNL). Specifically, we design a penalized regression to model the linear relation between network features and one-hot labels, where the noisy data are identified by the non-zero mean shift parameters solved in the regression model. To make the framework scalable to datasets that contain a large number of categories and training data, we propose a split algorithm to divide the whole training set into small pieces that can be solved by the penalized regression in parallel, leading to the Scalable Penalized Regression (SPR) framework. We provide the non-asymptotic probabilistic condition for SPR to correctly identify the noisy data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
