Data-Efficient Generation for Dataset Distillation
Zhe Li, Weitong Zhang, Sarah Cechnicka, Bernhard Kainz

TL;DR
This paper introduces a class-conditional latent diffusion model for dataset distillation that generates realistic synthetic images efficiently, reducing training costs and improving downstream task performance.
Contribution
It presents a novel diffusion-based approach for dataset distillation that produces human-readable images quickly, enhancing the quality and efficiency of synthetic datasets.
Findings
Achieved rank 1 in ECCV 2024 Dataset Distillation Challenge.
Generated realistic synthetic images at several tens per second.
Effective training with a small set of synthetic images on large real datasets.
Abstract
While deep learning techniques have proven successful in image-related tasks, the exponentially increased data storage and computation costs become a significant challenge. Dataset distillation addresses these challenges by synthesizing only a few images for each class that encapsulate all essential information. Most current methods focus on matching. The problems lie in the synthetic images not being human-readable and the dataset performance being insufficient for downstream learning tasks. Moreover, the distillation time can quickly get out of bounds when the number of synthetic images per class increases even slightly. To address this, we train a class conditional latent diffusion model capable of generating realistic synthetic images with labels. The sampling time can be reduced to several tens of images per seconds. We demonstrate that models can be effectively trained using only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Neural Networks and Applications
MethodsSparse Evolutionary Training · Diffusion · Latent Diffusion Model · Focus
