Curriculum Dataset Distillation
Zhiheng Ma, Anjia Cao, Funing Yang, Yihong Gong, Xing Wei

TL;DR
This paper introduces a scalable curriculum-based dataset distillation framework that improves the quality and generalization of synthetic datasets, achieving state-of-the-art results on large-scale image datasets with reduced computational costs.
Contribution
It proposes a novel curriculum-driven and adversarial optimization approach for dataset distillation, enhancing scalability, diversity, and robustness of synthetic datasets.
Findings
Achieves 11.1% improvement on Tiny-ImageNet
Achieves 9.0% improvement on ImageNet-1K
Achieves 7.3% improvement on ImageNet-21K
Abstract
Most dataset distillation methods struggle to accommodate large-scale datasets due to their substantial computational and memory requirements. Recent research has begun to explore scalable disentanglement methods. However, there are still performance bottlenecks and room for optimization in this direction. In this paper, we present a curriculum-based dataset distillation framework aiming to harmonize performance and scalability. This framework strategically distills synthetic images, adhering to a curriculum that transitions from simple to complex. By incorporating curriculum evaluation, we address the issue of previous methods generating images that tend to be homogeneous and simplistic, doing so at a manageable computational cost. Furthermore, we introduce adversarial optimization towards synthetic images to further improve their representativeness and safeguard against their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Machine Learning and Data Classification
