Batch Loss Score for Dynamic Data Pruning

Qing Zhou; Bingxuan Zhao; Tao Yang; Hongyuan Zhang; Junyu Gao; Qi Wang

arXiv:2604.04681·cs.LG·April 7, 2026

Batch Loss Score for Dynamic Data Pruning

Qing Zhou, Bingxuan Zhao, Tao Yang, Hongyuan Zhang, Junyu Gao, Qi Wang

PDF

1 Repo

TL;DR

The paper introduces Batch Loss Score (BLS), a simple, efficient proxy for per-sample importance in dynamic data pruning, enabling significant sample reduction across diverse datasets and models.

Contribution

BLS provides a theoretically grounded, easy-to-implement method for importance scoring that enhances existing pruning techniques without complex computations.

Findings

01

BLS achieves 20-50% sample pruning across 14 datasets and 11 tasks.

02

It simplifies code integration with three-line injection.

03

BLS effectively approximates individual loss contributions using EMA.

Abstract

Dynamic data pruning accelerates deep learning by selectively omitting less informative samples during training. While per-sample loss is a common importance metric, obtaining it can be challenging or infeasible for complex models or loss functions, often requiring significant implementation effort. This work proposes the Batch Loss Score (BLS), a computationally efficient alternative using an Exponential Moving Average (EMA) of readily available batch losses to assign scores to individual samples. We frame the batch loss, from the perspective of a single sample, as a noisy measurement of its scaled individual loss, with noise originating from stochastic batch composition. It is formally shown that the EMA mechanism functions as a first-order low-pass filter, attenuating high-frequency batch composition noise. This yields a score approximating the smoothed and persistent contribution of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mrazhou/BLS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.