Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels

Wenzhang Du

arXiv:2511.17040·cs.LG·November 24, 2025

Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels

Wenzhang Du

PDF

Open Access

TL;DR

Step-E is an integrated, differentiable data cleaning framework that dynamically excludes noisy and outlier samples during training, significantly improving neural network robustness and accuracy on noisy datasets.

Contribution

It introduces a novel online curriculum approach that combines sample selection and model training into a single optimization process, outperforming existing methods.

Findings

01

Significantly improves test accuracy on CIFAR-100N and CIFAR-10N datasets.

02

Outperforms loss truncation, self-paced learning, and one-shot filtering methods.

03

Nearly matches the performance of a clean-label oracle.

Abstract

Training data collected in the wild often contain noisy labels and outliers that substantially degrade the performance and reliability of deep neural networks. While data cleaning is commonly applied as a separate preprocessing stage, such two-stage pipelines neither fully exploit feedback from the downstream model nor adapt to unknown noise patterns. We propose Step-E, a simple framework that integrates sample selection and model learning into a single optimization process. At each epoch, Step-E ranks samples by loss and gradually increases the fraction of high-loss examples that are excluded from gradient updates after a brief warm-up stage, yielding an online curriculum that focuses on easy and consistent examples and eventually ignores persistent outliers. On CIFAR-100N, Step-E improves the test accuracy of a ResNet-18 model from 43.3% (+/- 0.7%) to 50.4% (+/- 0.9%), clearly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning