Label-Consistent Dataset Distillation with Detector-Guided Refinement
Yawen Zou, Guang Li, Zi Wang, Chunzhi Gu, Chao Zhang

TL;DR
This paper introduces a detector-guided dataset distillation method that refines synthetic samples to ensure label consistency and improve image quality, leading to better downstream performance.
Contribution
It proposes a novel detector-guided framework that refines synthetic data in dataset distillation by identifying and improving anomalous samples using a pre-trained detector.
Findings
Achieves state-of-the-art performance on validation sets.
Synthesizes high-quality images with richer details.
Ensures label consistency and intra-class diversity.
Abstract
Dataset distillation (DD) aims to generate a compact yet informative dataset that achieves performance comparable to the original dataset, thereby reducing demands on storage and computational resources. Although diffusion models have made significant progress in dataset distillation, the generated surrogate datasets often contain samples with label inconsistencies or insufficient structural detail, leading to suboptimal downstream performance. To address these issues, we propose a detector-guided dataset distillation framework that explicitly leverages a pre-trained detector to identify and refine anomalous synthetic samples, thereby ensuring label consistency and improving image quality. Specifically, a detector model trained on the original dataset is employed to identify anomalous images exhibiting label mismatches or low classification confidence. For each defective image, multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProcess Optimization and Integration
