Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun

TL;DR
This paper introduces a novel dataset pruning method that identifies important samples early in training using example difficulty and prediction uncertainty, significantly reducing pruning time while maintaining state-of-the-art accuracy.
Contribution
The paper proposes the DUAL score and ratio-adaptive sampling to enable efficient dataset pruning without full training, outperforming existing methods in speed and accuracy.
Findings
Reduces pruning time to 66% on ImageNet-1k with 60% accuracy at 90% pruning
Achieves 15% pruning time on CIFAR datasets with state-of-the-art performance
Maintains high accuracy despite aggressive dataset reduction
Abstract
Recent advances in deep learning rely heavily on massive datasets, leading to substantial storage and training costs. Dataset pruning aims to alleviate this demand by discarding redundant examples. However, many existing methods require training a model with a full dataset over a large number of epochs before being able to prune the dataset, which ironically makes the pruning process more expensive than just training the model on the entire dataset. To overcome this limitation, we introduce a Difficulty and Uncertainty-Aware Lightweight (DUAL) score, which aims to identify important samples from the early training stage by considering both example difficulty and prediction uncertainty. To address a catastrophic accuracy drop at an extreme pruning, we further propose a ratio-adaptive sampling using Beta distribution. Experiments on various datasets and learning scenarios such as image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
MethodsDataset Pruning · Pruning
