Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation

Ali Abbasi; Shima Imani; Chenyang An; Gayathri Mahalingam; Harsh; Shrivastava; Maurice Diesendruck; Hamed Pirsiavash; Pramod Sharma; Soheil; Kolouri

arXiv:2412.04668·cs.CV·December 9, 2024

Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation

Ali Abbasi, Shima Imani, Chenyang An, Gayathri Mahalingam, Harsh, Shrivastava, Maurice Diesendruck, Hamed Pirsiavash, Pramod Sharma, Soheil, Kolouri

PDF

Open Access

TL;DR

This paper introduces a two-stage method combining coreset selection and generative models to improve dataset distillation efficiency, quality, and diversity, achieving over 10% better results on large-scale benchmarks.

Contribution

It proposes a novel diffusion-augmented coreset expansion technique that enhances dataset distillation by dynamically expanding compressed coresets with generative models.

Findings

01

Over 10% improvement over state-of-the-art methods

02

Effective compression and expansion of datasets

03

Robust performance across multiple benchmarks

Abstract

With the rapid scaling of neural networks, data storage and communication demands have intensified. Dataset distillation has emerged as a promising solution, condensing information from extensive datasets into a compact set of synthetic samples by solving a bilevel optimization problem. However, current methods face challenges in computational efficiency, particularly with high-resolution data and complex architectures. Recently, knowledge-distillation-based dataset condensation approaches have made this process more computationally feasible. Yet, with the recent developments of generative foundation models, there is now an opportunity to achieve even greater compression, enhance the quality of distilled data, and introduce valuable diversity into the data representation. In this work, we propose a two-stage solution. First, we compress the dataset by selecting only the most informative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training