IMS3: Breaking Distributional Aggregation in Diffusion-Based Dataset Distillation
Chenru Wang, Yunyi Chen, Zijun Yang, Joey Tianyi Zhou, Chi Zhang

TL;DR
This paper introduces IMS3, a novel diffusion-based dataset distillation method that improves diversity and class separation in synthetic datasets, leading to better generalization and state-of-the-art results.
Contribution
The paper proposes Inversion-Matching and Selective Subgroup Sampling strategies to address distributional coverage and class separability issues in diffusion-based dataset distillation.
Findings
Enhanced dataset diversity and coverage.
Improved inter-class separability.
Achieved state-of-the-art performance among diffusion methods.
Abstract
Dataset Distillation aims to synthesize compact datasets that can approximate the training efficacy of large-scale real datasets, offering an efficient solution to the increasing computational demands of modern deep learning. Recently, diffusion-based dataset distillation methods have shown great promise by leveraging the strong generative capacity of diffusion models to produce diverse and structurally consistent samples. However, a fundamental goal misalignment persists: diffusion models are optimized for generative likelihood rather than discriminative utility, resulting in over-concentration in high-density regions and inadequate coverage of boundary samples crucial for classification. To address this issue, we propose two complementary strategies. Inversion-Matching (IM) introduces an inversion-guided fine-tuning process that aligns denoising trajectories with their inversion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications · Domain Adaptation and Few-Shot Learning
