MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

Jeffrey A. Chan-Santiago; Praveen Tirupattur; Gaurav Kumar Nayak; Gaowen Liu; Mubarak Shah

arXiv:2505.18963·cs.CV·May 27, 2025

MGD$^3$: Mode-Guided Dataset Distillation using Diffusion Models

Jeffrey A. Chan-Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, Mubarak Shah

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces MGD$^3$, a mode-guided dataset distillation method using pre-trained diffusion models that improves diversity and accuracy without fine-tuning, reducing computational costs.

Contribution

It proposes a novel mode-guided diffusion approach that enhances dataset diversity and performance without requiring fine-tuning with distillation losses.

Findings

01

Achieves up to 4.4% accuracy improvement on ImageNette

02

Outperforms state-of-the-art methods in dataset distillation

03

Reduces computational costs by eliminating fine-tuning steps

Abstract

Dataset distillation has emerged as an effective strategy, significantly reducing training costs and facilitating more efficient model deployment. Recent advances have leveraged generative models to distill datasets by capturing the underlying data distribution. Unfortunately, existing methods require model fine-tuning with distillation losses to encourage diversity and representativeness. However, these methods do not guarantee sample diversity, limiting their performance. We propose a mode-guided diffusion model leveraging a pre-trained diffusion model without the need to fine-tune with distillation losses. Our approach addresses dataset diversity in three stages: Mode Discovery to identify distinct data modes, Mode Guidance to enhance intra-class diversity, and Stop Guidance to mitigate artifacts in synthetic samples that affect performance. Our approach outperforms state-of-the-art…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

1. The authors consider the soft-label protocol and hard-label protocol, which are important for fair comparison in dataset distillation. We can see the difference between two the protocols in Table 10. 2. The performance improvements reported in this paper are significant.

Weaknesses

1. It is good that the authors clarify the the soft-label protocol and hard-label protocol. However, what protocol is used in Table 1, 2, 3? If the soft-label protocol is used, what are the parameters of valiation epochs, teacher networks, and the data augmentation methods? 2. I do not see a clear formulation explaining the derivation of the modes. Only Section 3.2 (lines 307 to 319) discusses the application of modes. My main concern is with the method used to select the modes; specifically,

Reviewer 02Rating 3Confidence 4

Strengths

1. The idea of mode guidance for target-distribution synthesis is reasonable and easy to implement. 2. Extensive experiments and study have been provided.

Weaknesses

1. The writing can be improved, especially the logic and causality in introduction. The citation format is unsuitable. Some words/phrases have inconsistent capitalization. 2. Some statements are not rigorous: a) The authors claim “… diffusion models … do not suffer from mode collapse” in introduction. Is it theoretically or empirically proved in previous work? It also has conflicts with the claims in other paragraphs. b) “We validate this by addressing the following question: Given a pre-train

Reviewer 03Rating 6Confidence 4

Strengths

- Ensuring dataset sample diversity has long been a challenge in the field of dataset distillation. This paper addresses this by employing mode guidance to generate as diverse samples as possible for each class, minimizing redundancy and significantly enhancing intra-class diversity in the generated dataset. - The paper utilizes pre-trained diffusion models to generate datasets without the need for additional fine-tuning, relying only on guidance during the denoising process, which simplifies t

Weaknesses

- The mechanisms of mode and stop guidance are clear but lack strong theoretical support. - The method is easy to understand but could be optimized, such as by automating the adjustment of $t_{SG}$ and improving the mode discovery algorithm. - The approach is still fundamentally about image generation via diffusion models, with insufficient exploration of its contribution to dataset distillation.

Videos

MGD$^3$ : Mode-Guided Dataset Distillation using Diffusion Models· slideslive

Taxonomy

TopicsReservoir Engineering and Simulation Methods

MethodsDiffusion