Large-scale Dataset Pruning with Dynamic Uncertainty
Muyang He, Shuo Yang, Tiejun Huang, Bo Zhao

TL;DR
This paper introduces a dataset pruning method that uses prediction uncertainty and training dynamics to create smaller, informative datasets for training deep models, significantly reducing data size with minimal performance loss.
Contribution
The paper presents a novel dataset pruning approach leveraging uncertainty and training dynamics, achieving high pruning ratios while maintaining model performance.
Findings
Outperforms existing methods on large-scale datasets
Achieves 25% lossless pruning ratio on ImageNet datasets
Reduces training data size with negligible accuracy drop
Abstract
The state of the art of many learning tasks, e.g., image classification, is advanced by collecting larger datasets and then training larger models on them. As the outcome, the increasing computational cost is becoming unaffordable. In this paper, we investigate how to prune the large-scale datasets, and thus produce an informative subset for training sophisticated deep models with negligible performance drop. We propose a simple yet effective dataset pruning method by exploring both the prediction uncertainty and training dynamics. We study dataset pruning by measuring the variation of predictions during the whole training process on large-scale datasets, i.e., ImageNet-1K and ImageNet-21K, and advanced models, i.e., Swin Transformer and ConvNeXt. Extensive experimental results indicate that our method outperforms the state of the art and achieves 25% lossless pruning ratio on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition
MethodsDataset Pruning · Multi-Head Attention · Attention Is All You Need · Pruning · ConvNeXt · Dropout · Linear Layer · Label Smoothing · Stochastic Depth · Position-Wise Feed-Forward Layer
