CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang; Zhenghao Zhao; Junyi Wu; Yuzhang Shang; Gaowen Liu; Yan Yan

arXiv:2506.22637·cs.CV·July 10, 2025

CaO$_2$: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, Yan Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces CaO$_2$, a diffusion-based dataset distillation method that aligns the distillation process with evaluation objectives, resolving key inconsistencies and achieving state-of-the-art accuracy on ImageNet.

Contribution

CaO$_2$ is a novel two-stage diffusion framework that addresses objective and condition inconsistencies in dataset distillation.

Findings

01

Achieves 2.3% higher accuracy on ImageNet compared to baselines.

02

Effectively resolves objective and condition inconsistencies in diffusion-based distillation.

03

Outperforms existing methods in creating compact, high-quality datasets.

Abstract

The recent introduction of diffusion models in dataset distillation has shown promising potential in creating compact surrogate datasets for large, high-resolution target datasets, offering improved efficiency and performance over traditional bi-level/uni-level optimization methods. However, current diffusion-based dataset distillation approaches overlook the evaluation process and exhibit two critical inconsistencies in the distillation process: (1) Objective Inconsistency, where the distillation process diverges from the evaluation objective, and (2) Condition Inconsistency, leading to mismatches between generated images and their corresponding conditions. To resolve these issues, we introduce Condition-aware Optimization with Objective-guided Sampling (CaO $_{2}$ ), a two-stage diffusion-based framework that aligns the distillation process with the evaluation objective. The first stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hatchetproject/cao2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsDiffusion