Label-Consistent Dataset Distillation with Detector-Guided Refinement

Yawen Zou; Guang Li; Zi Wang; Chunzhi Gu; Chao Zhang

arXiv:2507.13074·cs.CV·February 19, 2026

Label-Consistent Dataset Distillation with Detector-Guided Refinement

Yawen Zou, Guang Li, Zi Wang, Chunzhi Gu, Chao Zhang

PDF

Open Access

TL;DR

This paper introduces a detector-guided dataset distillation method that refines synthetic samples to ensure label consistency and improve image quality, leading to better downstream performance.

Contribution

It proposes a novel detector-guided framework that refines synthetic data in dataset distillation by identifying and improving anomalous samples using a pre-trained detector.

Findings

01

Achieves state-of-the-art performance on validation sets.

02

Synthesizes high-quality images with richer details.

03

Ensures label consistency and intra-class diversity.

Abstract

Dataset distillation (DD) aims to generate a compact yet informative dataset that achieves performance comparable to the original dataset, thereby reducing demands on storage and computational resources. Although diffusion models have made significant progress in dataset distillation, the generated surrogate datasets often contain samples with label inconsistencies or insufficient structural detail, leading to suboptimal downstream performance. To address these issues, we propose a detector-guided dataset distillation framework that explicitly leverages a pre-trained detector to identify and refine anomalous synthetic samples, thereby ensuring label consistency and improving image quality. Specifically, a detector model trained on the original dataset is employed to identify anomalous images exhibiting label mismatches or low classification confidence. For each defective image, multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration