Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data

Zerun Wang; Jiafeng Mao; Xueting Wang; Toshihiko Yamasaki

arXiv:2411.18109·cs.CV·January 8, 2026

Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data

Zerun Wang, Jiafeng Mao, Xueting Wang, Toshihiko Yamasaki

PDF

Open Access 1 Video

TL;DR

This paper introduces a difficulty-controlled diffusion model that generates challenging training samples to improve model performance efficiently, reducing data and computational costs.

Contribution

It presents a novel method to control sample difficulty during generation, enabling efficient creation of hard samples that enhance training effectiveness.

Findings

01

Achieves higher performance with only 10% additional synthetic data.

02

Reduces generation time by 63.4 GPU hours compared to SOTA.

03

Provides visualizations of category-specific hard factors.

Abstract

Generative models have become a powerful tool for synthesizing training data in computer vision tasks. Current approaches solely focus on aligning generated images with the target dataset distribution. As a result, they capture only the common features in the real dataset and mostly generate 'easy samples', which are already well learned by models trained on real data. In contrast, those rare 'hard samples', with atypical features but crucial for enhancing performance, cannot be effectively generated. Consequently, these approaches must synthesize large volumes of data to yield appreciable performance gains, yet the improvement remains limited. To overcome this limitation, we present a novel method that can learn to control the learning difficulty of samples during generation while also achieving domain alignment. Thus, it can efficiently generate valuable 'hard samples' that yield…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Difficulty Controlled Diffusion Model for Synthesizing Effective Training Data· underline

Taxonomy

TopicsEducational Technology and Assessment

MethodsSparse Evolutionary Training