Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning
Brian B. Moser, Federico Raue, Tobias C. Nauen, Stanislav Frolov, Andreas Dengel

TL;DR
This paper presents a novel dataset distillation method that prunes non-beneficial samples using loss-based sampling before distillation, resulting in improved generalization and accuracy even with significant dataset reduction.
Contribution
Introduces a 'Prune First, Distill After' framework that enhances dataset distillation by systematically pruning datasets prior to distillation, improving performance and robustness.
Findings
Achieves up to 5.2% accuracy improvement with 80% dataset pruning.
Creates a representative core-set that generalizes across architectures.
Demonstrates robustness and effectiveness of the pruning-based distillation approach.
Abstract
Dataset distillation has gained significant interest in recent years, yet existing approaches typically distill from the entire dataset, potentially including non-beneficial samples. We introduce a novel "Prune First, Distill After" framework that systematically prunes datasets via loss-based sampling prior to distillation. By leveraging pruning before classical distillation techniques and generative priors, we create a representative core-set that leads to enhanced generalization for unseen architectures - a significant challenge of current distillation methods. More specifically, our proposed framework significantly boosts distilled quality, achieving up to a 5.2 percentage points accuracy increase even with substantial dataset pruning, i.e., removing 80% of the original dataset prior to distillation. Overall, our experimental results highlight the advantages of our easy-sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Mining Algorithms and Applications · Data Quality and Management
MethodsPruning
