Learnability-Guided Diffusion for Dataset Distillation

Jeffrey A. Chan-Santiago; Mubarak Shah

arXiv:2604.00519·cs.CV·April 2, 2026

Learnability-Guided Diffusion for Dataset Distillation

Jeffrey A. Chan-Santiago, Mubarak Shah

PDF

1 Repo

TL;DR

This paper introduces a learnability-guided diffusion method for dataset distillation that incrementally constructs synthetic datasets, reducing redundancy and improving performance on image classification benchmarks.

Contribution

It proposes a novel curriculum-based approach using learnability scores and diffusion models to generate more effective, less redundant synthetic datasets for training machine learning models.

Findings

01

Reduces dataset redundancy by 39.1%.

02

Achieves state-of-the-art results on ImageNet-1K with 60.1%.

03

Promotes specialization across training stages.

Abstract

Training machine learning models on massive datasets is expensive and time-consuming. Dataset distillation addresses this by creating a small synthetic dataset that achieves the same performance as the full dataset. Recent methods use diffusion models to generate distilled data, either by promoting diversity or matching training gradients. However, existing approaches produce redundant training signals, where samples convey overlapping information. Empirically, disjoint subsets of distilled datasets capture 80-90% overlapping signals. This redundancy stems from optimizing visual diversity or average training dynamics without accounting for similarity across samples, leading to datasets where multiple samples share similar information rather than complementary knowledge. We propose learnability-driven dataset distillation, which constructs synthetic datasets incrementally through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://jachansantiago.github.io/learnability-guided-distillation
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.