Information-Guided Diffusion Sampling for Dataset Distillation

Linfeng Ye; Shayan Mohajer Hamidi; Guang Li; Takahiro Ogawa; Miki Haseyama; Konstantinos N. Plataniotis

arXiv:2507.04619·cs.LG·July 8, 2025

Information-Guided Diffusion Sampling for Dataset Distillation

Linfeng Ye, Shayan Mohajer Hamidi, Guang Li, Takahiro Ogawa, Miki Haseyama, Konstantinos N. Plataniotis

PDF

TL;DR

This paper introduces an information-theoretic approach to improve dataset distillation using diffusion models, especially in low images-per-class settings, by maximizing prototype and contextual information during sampling.

Contribution

It proposes a novel information-guided diffusion sampling (IGDS) method that enhances dataset distillation by estimating and maximizing key information measures.

Findings

01

IGDS outperforms existing methods across all IPC settings.

02

Significant improvements in low-IPC regimes on Tiny ImageNet and ImageNet.

03

Effective integration of information theory with diffusion models.

Abstract

Dataset distillation aims to create a compact dataset that retains essential information while maintaining model performance. Diffusion models (DMs) have shown promise for this task but struggle in low images-per-class (IPC) settings, where generated samples lack diversity. In this paper, we address this issue from an information-theoretic perspective by identifying two key types of information that a distilled dataset must preserve: ( $i$ ) prototype information $I (X; Y)$ , which captures label-relevant features; and ( $ii$ ) contextual information $H (X ∣ Y)$ , which preserves intra-class variability. Here, $(X, Y)$ represents the pair of random variables corresponding to the input data and its ground truth label, respectively. Observing that the required contextual information scales with IPC, we propose maximizing $I (X; Y) + β H (X ∣ Y)$ during the DM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDiffusion