Curriculum Dataset Distillation

Zhiheng Ma; Anjia Cao; Funing Yang; Yihong Gong; Xing Wei

arXiv:2405.09150·cs.CV·July 14, 2025·1 cites

Curriculum Dataset Distillation

Zhiheng Ma, Anjia Cao, Funing Yang, Yihong Gong, Xing Wei

PDF

Open Access

TL;DR

This paper introduces a scalable curriculum-based dataset distillation framework that improves the quality and generalization of synthetic datasets, achieving state-of-the-art results on large-scale image datasets with reduced computational costs.

Contribution

It proposes a novel curriculum-driven and adversarial optimization approach for dataset distillation, enhancing scalability, diversity, and robustness of synthetic datasets.

Findings

01

Achieves 11.1% improvement on Tiny-ImageNet

02

Achieves 9.0% improvement on ImageNet-1K

03

Achieves 7.3% improvement on ImageNet-21K

Abstract

Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still performance bottlenecks and room for optimization in this direction. In this paper, we present a curriculum-based dataset distillation framework aiming to harmonize performance and scalability. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline Learning and Analytics · Machine Learning and Data Classification