Dataset Distillation Meets Provable Subset Selection
Murad Tukan, Alaa Maalouf, Margarita Osadchy

TL;DR
This paper introduces a provable, importance-based subset selection method to improve dataset distillation, reducing data redundancy and enhancing synthetic dataset quality for deep learning models.
Contribution
It presents a novel, theoretically grounded approach for initializing and training distilled datasets by identifying important data points, merging subset selection with distillation.
Findings
Improved dataset distillation performance on benchmark tasks.
Effective identification of important data points reduces redundancy.
Enhanced synthetic datasets maintain accuracy with fewer data samples.
Abstract
Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset distillation was proposed to compress a large training dataset into a smaller synthetic one that retains its performance -- this is usually done by (1) uniformly initializing a synthetic set and (2) iteratively updating/learning this set according to a predefined loss by uniformly sampling instances from the full data. In this paper, we improve both phases of dataset distillation: (1) we present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data, and (2) we further merge the idea of data subset selection with dataset distillation, by training the distilled set on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
