Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels

Yikai Wang; Yanwei Fu; and Xinwei Sun

arXiv:2301.00545·cs.LG·November 30, 2023

Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels

Yikai Wang, Yanwei Fu, and Xinwei Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a theoretically grounded framework called Knockoffs-SPR for selecting clean samples in noisy label learning, improving neural network robustness and generalization.

Contribution

It proposes a novel scalable penalized regression method with knockoff filters for accurate clean sample selection under noisy labels.

Findings

01

Effective in identifying clean data in noisy datasets

02

Controls false selection rate in sample filtering

03

Improves neural network robustness and accuracy

Abstract

A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yikai-wang/knockoffs-spr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Image and Signal Denoising Methods · Neural Networks and Applications