Dataset Pruning: Reducing Training Data by Examining Generalization Influence
Shuo Yang, Zeke Xie, Hanyu Peng, Min Xu, Mingming Sun, Ping Li

TL;DR
This paper introduces dataset pruning, an optimization-based method to identify minimal training subsets that maintain model performance, reducing data size and training time while providing theoretical guarantees.
Contribution
It proposes a novel dataset pruning technique with theoretical analysis and demonstrates significant data reduction and efficiency gains over existing methods.
Findings
Prunes 40% of training data on CIFAR-10.
Halves training convergence time with minimal accuracy loss.
Results align well with theoretical predictions.
Abstract
The great success of deep learning heavily relies on increasingly larger training data, which comes at a price of huge computational and infrastructural costs. This poses crucial questions that, do all training data contribute to model's performance? How much does each individual training sample or a sub-training-set affect the model's generalization, and how to construct the smallest subset from the entire training data as a proxy training set without significantly sacrificing the model's performance? To answer these, we propose dataset pruning, an optimization-based sample selection method that can (1) examine the influence of removing a particular set of training samples on model's generalization ability with theoretical guarantee, and (2) construct the smallest subset of training data that yields strictly constrained generalization gap. The empirically observed generalization gap of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Neural Networks and Applications
MethodsDataset Pruning · Pruning
