Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation
Ali Abbasi, Shima Imani, Chenyang An, Gayathri Mahalingam, Harsh, Shrivastava, Maurice Diesendruck, Hamed Pirsiavash, Pramod Sharma, Soheil, Kolouri

TL;DR
This paper introduces a two-stage method combining coreset selection and generative models to improve dataset distillation efficiency, quality, and diversity, achieving over 10% better results on large-scale benchmarks.
Contribution
It proposes a novel diffusion-augmented coreset expansion technique that enhances dataset distillation by dynamically expanding compressed coresets with generative models.
Findings
Over 10% improvement over state-of-the-art methods
Effective compression and expansion of datasets
Robust performance across multiple benchmarks
Abstract
With the rapid scaling of neural networks, data storage and communication demands have intensified. Dataset distillation has emerged as a promising solution, condensing information from extensive datasets into a compact set of synthetic samples by solving a bilevel optimization problem. However, current methods face challenges in computational efficiency, particularly with high-resolution data and complex architectures. Recently, knowledge-distillation-based dataset condensation approaches have made this process more computationally feasible. Yet, with the recent developments of generative foundation models, there is now an opportunity to achieve even greater compression, enhance the quality of distilled data, and introduce valuable diversity into the data representation. In this work, we propose a two-stage solution. First, we compress the dataset by selecting only the most informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSparse Evolutionary Training
