Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels
Wenzhang Du

TL;DR
Step-E is an integrated, differentiable data cleaning framework that dynamically excludes noisy and outlier samples during training, significantly improving neural network robustness and accuracy on noisy datasets.
Contribution
It introduces a novel online curriculum approach that combines sample selection and model training into a single optimization process, outperforming existing methods.
Findings
Significantly improves test accuracy on CIFAR-100N and CIFAR-10N datasets.
Outperforms loss truncation, self-paced learning, and one-shot filtering methods.
Nearly matches the performance of a clean-label oracle.
Abstract
Training data collected in the wild often contain noisy labels and outliers that substantially degrade the performance and reliability of deep neural networks. While data cleaning is commonly applied as a separate preprocessing stage, such two-stage pipelines neither fully exploit feedback from the downstream model nor adapt to unknown noise patterns. We propose Step-E, a simple framework that integrates sample selection and model learning into a single optimization process. At each epoch, Step-E ranks samples by loss and gradually increases the fraction of high-loss examples that are excluded from gradient updates after a brief warm-up stage, yielding an online curriculum that focuses on easy and consistent examples and eventually ignores persistent outliers. On CIFAR-100N, Step-E improves the test accuracy of a ResNet-18 model from 43.3% (+/- 0.7%) to 50.4% (+/- 0.9%), clearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
