Taming Diffusion for Dataset Distillation with High Representativeness
Lin Zhao, Yushu Wu, Xinru Jiang, Jianyang Gu, Yanzhi Wang, Xiaolin Xu, Pu Zhao, Xue Lin

TL;DR
This paper introduces D^3HR, a diffusion-based framework for dataset distillation that enhances the representativeness of distilled datasets, leading to improved accuracy across various models.
Contribution
The paper proposes a novel diffusion-based method using DDIM inversion and an efficient sampling scheme to generate more representative distilled datasets.
Findings
D^3HR achieves higher accuracy than state-of-the-art methods.
The method maintains structural consistency of data.
Improved distribution matching in dataset distillation.
Abstract
Recent deep learning models demand larger datasets, driving the need for dataset distillation to create compact, cost-efficient datasets while maintaining performance. Due to the powerful image generation capability of diffusion, it has been introduced to this field for generating distilled images. In this paper, we systematically investigate issues present in current diffusion-based dataset distillation methods, including inaccurate distribution matching, distribution deviation with random noise, and separate sampling. Building on this, we propose D^3HR, a novel diffusion-based framework to generate distilled datasets with high representativeness. Specifically, we adopt DDIM inversion to map the latents of the full dataset from a low-normality latent domain to a high-normality Gaussian domain, preserving information and ensuring structural consistency to generate representative latents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Data Stream Mining Techniques
MethodsADaptive gradient method with the OPTimal convergence rate · ALIGN
