TL;DR
The paper introduces Batch Loss Score (BLS), a simple, efficient proxy for per-sample importance in dynamic data pruning, enabling significant sample reduction across diverse datasets and models.
Contribution
BLS provides a theoretically grounded, easy-to-implement method for importance scoring that enhances existing pruning techniques without complex computations.
Findings
BLS achieves 20-50% sample pruning across 14 datasets and 11 tasks.
It simplifies code integration with three-line injection.
BLS effectively approximates individual loss contributions using EMA.
Abstract
Dynamic data pruning accelerates deep learning by selectively omitting less informative samples during training. While per-sample loss is a common importance metric, obtaining it can be challenging or infeasible for complex models or loss functions, often requiring significant implementation effort. This work proposes the Batch Loss Score (BLS), a computationally efficient alternative using an Exponential Moving Average (EMA) of readily available batch losses to assign scores to individual samples. We frame the batch loss, from the perspective of a single sample, as a noisy measurement of its scaled individual loss, with noise originating from stochastic batch composition. It is formally shown that the EMA mechanism functions as a first-order low-pass filter, attenuating high-frequency batch composition noise. This yields a score approximating the smoothed and persistent contribution of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
