Accelerating Diffusion Model Training under Minimal Budgets: A Condensation-Based Perspective
Rui Huang, Shitong Shao, Zikai Zhou, Pukun Zhao, Hangyu Guo, Tian Ye, Lichen Bai, Shuo Yang, Zeke Xie

TL;DR
This paper introduces D2C, a dataset condensation framework that significantly accelerates diffusion model training on limited data, maintaining high quality and reducing training time by over 100 times.
Contribution
The paper presents the first systematic dataset condensation method tailored for diffusion models, combining selection and augmentation phases to enable fast training with minimal data.
Findings
D2C achieves high-quality diffusion results with only 0.8% of ImageNet data.
Training time is reduced by over 100 times compared to standard methods.
D2C maintains competitive FID scores across various models and resolutions.
Abstract
Diffusion models have achieved remarkable performance on a wide range of generative tasks, yet training them from scratch is notoriously resource-intensive, typically requiring millions of training images and many GPU days. Motivated by a data-centric view of this bottleneck, we adopt a condensation-based perspective: given a large training set, the goal is to construct a much smaller condensed dataset that still supports training strong diffusion models under minimal data and compute budgets. To operationalize this perspective, we introduce Diffusion Dataset Condensation (D2C), a two-phase framework comprising Select and Attach. In the Select phase, a diffusion difficulty score combined with interval sampling is used to identify a compact, informative training subset from the original data. Building on this subset, the Attach phase further strengthens the conditional signals by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neuroimaging Techniques and Applications
MethodsDiffusion
