CoDA: From Text-to-Image Diffusion Models to Training-Free Dataset Distillation
Letian Zhou, Songhua Liu, Xinchao Wang

TL;DR
CoDA introduces a training-free dataset distillation method that uses a text-to-image model and core distribution alignment to produce high-quality, representative datasets without target-specific generative training, achieving state-of-the-art results.
Contribution
The paper proposes a novel framework, CoDA, that distills datasets using off-the-shelf text-to-image models and core distribution alignment, eliminating the need for target-specific generative models.
Findings
Achieves state-of-the-art accuracy of 60.4% on ImageNet-1K with 50 images per class.
Outperforms previous methods relying on target-specific diffusion models.
Effectively bridges the gap between general priors and target semantics.
Abstract
Prevailing Dataset Distillation (DD) methods leveraging generative models confront two fundamental limitations. First, despite pioneering the use of diffusion models in DD and delivering impressive performance, the vast majority of approaches paradoxically require a diffusion model pre-trained on the full target dataset, undermining the very purpose of DD and incurring prohibitive training costs. Second, although some methods turn to general text-to-image models without relying on such target-specific training, they suffer from a significant distributional mismatch, as the web-scale priors encapsulated in these foundation models fail to faithfully capture the target-specific semantics, leading to suboptimal performance. To tackle these challenges, we propose Core Distribution Alignment (CoDA), a framework that enables effective DD using only an off-the-shelf text-to-image model. Our key…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper clearly identifies a practical gap in current diffusion-based dataset distillation: many approaches rely on a diffusion model trained on the very dataset they aim to compress. CoDA directly addresses this by using only an off-the-shelf model. 2. The two-stage pipeline—(i) discovering a core distribution in latent space and (ii) aligning the diffusion generation to that distribution—matches the problem formulation and makes the contribution transparent. 3. The use of latent embeddi
1. Although the approach does not train a diffusion model, the discovery stage (encoding, dimensionality reduction, clustering) plus guided sampling still incurs noticeable computational cost. A more explicit comparison of wall-clock time and memory with key baselines would make the “training-free” claim more convincing. 2. The quality of the discovered core distribution may depend on settings for the dimensionality reduction and clustering steps. The paper suggests that different datasets may
1. The paper tackles a fundamental, conceptual flaw in a popular research area and provides a complete, working solution. This "truly training-free" framework is what generative DD should have been from the start. 2. The 2-stage design is a key strength. The paper identifies that both discovering the right distribution and aligning to it are essential. The Distribution Discovery pipeline is a major contribution on its own. 3. The discovery that the generated set Ours (G) can outperform
1. The authors disclose that the key hyperparameters for the Distribution Discovery stage (n_neighbors for UMAP and min_cluster_size for HDBSCAN) exhibit "drift" across different datasets. While the method remains robustly above the baseline, achieving peak performance requires a new, dataset-specific grid search. This somewhat undermines the "plug-and-play" nature of the "truly training-free" framework. 2. This entire chain is extremely brittle. A small, insignificant change in the initial
- **Clear formulation**: the paper presents a clean DD framework that combines the clustering-based representative discovery (such as D$^4$M and MGD$^3$) and energy-based diffusion guidance (such as MGD$^3$ and IGD). Even though both components exist in prior works, CoDA articulates them under a consistent probabilistic formulation (Eq. 4–7), which makes it easy to follow and reproduce. - **Strong empirical performance and generalization**: CoDA achieves SOTA accuracy on multiple benchmarks, an
- **Overstated “truely training-free DD” claim and misleading title**: The title “From Text-to-Image Diffusion Models to Truly Training-Free Dataset Distillation” is somewhat exaggerated and conceptually inconsistent. The authors argue that prior generative DD methods are paradoxical because they depend on diffusion models pretrained on the target dataset like imagenet-1k, whereas CoDA avoids this by using a general text-to-image diffusion model (SDXL). However, this reasoning is misleading: usi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
