MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models
Jeffrey A. Chan-Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, Mubarak Shah

TL;DR
This paper introduces MGD$^3$, a mode-guided dataset distillation method using pre-trained diffusion models that improves diversity and accuracy without fine-tuning, reducing computational costs.
Contribution
It proposes a novel mode-guided diffusion approach that enhances dataset diversity and performance without requiring fine-tuning with distillation losses.
Findings
Achieves up to 4.4% accuracy improvement on ImageNette
Outperforms state-of-the-art methods in dataset distillation
Reduces computational costs by eliminating fine-tuning steps
Abstract
Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the underlying data distribution. Unfortunately, existing methods require model fine-tuning with distillation losses to encourage diversity and representativeness. However, these methods do not guarantee sample diversity, limiting their performance. We propose a mode-guided diffusion model leveraging a pre-trained diffusion model without the need to fine-tune with distillation losses. Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples that affect performance. Our approach outperforms state-of-the-art…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The authors consider the soft-label protocol and hard-label protocol, which are important for fair comparison in dataset distillation. We can see the difference between two the protocols in Table 10. 2. The performance improvements reported in this paper are significant.
1. It is good that the authors clarify the the soft-label protocol and hard-label protocol. However, what protocol is used in Table 1, 2, 3? If the soft-label protocol is used, what are the parameters of valiation epochs, teacher networks, and the data augmentation methods? 2. I do not see a clear formulation explaining the derivation of the modes. Only Section 3.2 (lines 307 to 319) discusses the application of modes. My main concern is with the method used to select the modes; specifically,
1. The idea of mode guidance for target-distribution synthesis is reasonable and easy to implement. 2. Extensive experiments and study have been provided.
1. The writing can be improved, especially the logic and causality in introduction. The citation format is unsuitable. Some words/phrases have inconsistent capitalization. 2. Some statements are not rigorous: a) The authors claim “… diffusion models … do not suffer from mode collapse” in introduction. Is it theoretically or empirically proved in previous work? It also has conflicts with the claims in other paragraphs. b) “We validate this by addressing the following question: Given a pre-train
- Ensuring dataset sample diversity has long been a challenge in the field of dataset distillation. This paper addresses this by employing mode guidance to generate as diverse samples as possible for each class, minimizing redundancy and significantly enhancing intra-class diversity in the generated dataset. - The paper utilizes pre-trained diffusion models to generate datasets without the need for additional fine-tuning, relying only on guidance during the denoising process, which simplifies t
- The mechanisms of mode and stop guidance are clear but lack strong theoretical support. - The method is easy to understand but could be optimized, such as by automating the adjustment of $t_{SG}$ and improving the mode discovery algorithm. - The approach is still fundamentally about image generation via diffusion models, with insufficient exploration of its contribution to dataset distillation.
Videos
Taxonomy
TopicsReservoir Engineering and Simulation Methods
MethodsDiffusion
