Dataset Condensation with Color Compensation
Huyu Wu, Duo Su, Junjie Hou, Guang Li

TL;DR
This paper introduces DC3, a novel dataset condensation method that enhances color diversity using diffusion models, improving performance and generalization in representation learning without semantic distortion.
Contribution
DC3 is the first to fine-tune pre-trained diffusion models with condensed datasets, addressing colorfulness and semantic fidelity in dataset condensation.
Findings
DC3 outperforms state-of-the-art methods across multiple benchmarks.
Enhanced color diversity improves downstream task performance.
High-quality datasets enable training without model collapse.
Abstract
Dataset condensation always faces a constitutive trade-off: balancing performance and fidelity under extreme compression. Existing methods struggle with two bottlenecks: image-level selection methods (Coreset Selection, Dataset Quantization) suffer from inefficiency condensation, while pixel-level optimization (Dataset Distillation) introduces semantic distortion due to over-parameterization. With empirical observations, we find that a critical problem in dataset condensation is the oversight of color's dual role as an information carrier and a basic semantic representation unit. We argue that improving the colorfulness of condensed images is beneficial for representation learning. Motivated by this, we propose DC3: a Dataset Condensation framework with Color Compensation. After a calibrated selection strategy, DC3 utilizes the latent diffusion model to enhance the color diversity of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Data Compression Techniques
