DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models

Qichao Wang; Yunhong Lu; Hengyuan Cao; Junyi Zhang; Min Zhang

arXiv:2605.03877·cs.CV·May 6, 2026

DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models

Qichao Wang, Yunhong Lu, Hengyuan Cao, Junyi Zhang, Min Zhang

PDF

TL;DR

The paper introduces DMGD, a training-free dataset distillation method using diffusion models with semantic and distribution matching, achieving state-of-the-art results efficiently.

Contribution

Proposes a novel training-free diffusion-based dataset distillation framework with semantic and distribution matching, eliminating the need for fine-tuning.

Findings

01

Outperforms SOTA methods on ImageNet variants with accuracy gains of 2.1%, 5.4%, and 2.4%.

02

Introduces semantic matching via likelihood optimization without auxiliary classifiers.

03

Develops efficient strategies for distribution matching with minimal computational overhead.

Abstract

Dataset distillation enables efficient training by distilling the information of large-scale datasets into significantly smaller synthetic datasets. Diffusion based paradigms have emerged in recent years, offering novel perspectives for dataset distillation. However, they typically necessitate additional fine-tuning stages, and effective guidance mechanisms remain underexplored. To address these limitations, we rethink diffusion based dataset distillation and propose a Dual Matching Guided Diffusion (DMGD) framework, centered on efficient training-free guidance. We first establish Semantic Matching via conditional likelihood optimization, eliminating the need for auxiliary classifiers. Furthermore, we propose a dynamic guidance mechanism that enhances the diversity of synthetic data while maintaining semantic alignment. Simultaneously, we introduce an optimal transport (OT) based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.