Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios

Siyue Yao; Mingjie Sun; Eng Gee Lim; Ran Yi; Baojiang Zhong; Moncef Gabbouj

arXiv:2507.09915·cs.CV·November 5, 2025

Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios

Siyue Yao, Mingjie Sun, Eng Gee Lim, Ran Yi, Baojiang Zhong, Moncef Gabbouj

PDF

Open Access

TL;DR

Crucial-Diff is a novel, domain-agnostic diffusion framework that synthesizes crucial, hard-to-detect training samples to improve detection and segmentation in data-scarce scenarios, outperforming existing methods.

Contribution

It introduces a unified, efficient approach combining SAFE and WASM modules to generate diverse, high-quality training data targeting model weaknesses.

Findings

01

Achieves 83.63% pixel-level AP on MVTec dataset.

02

Reaches 81.64% mIoU on polyp dataset.

03

Outperforms existing synthetic data methods.

Abstract

The scarcity of data in various scenarios, such as medical, industry and autonomous driving, leads to model overfitting and dataset imbalance, thus hindering effective detection and segmentation performance. Existing studies employ the generative models to synthesize more training samples to mitigate data scarcity. However, these synthetic samples are repetitive or simplistic and fail to provide "crucial information" that targets the downstream model's weaknesses. Additionally, these methods typically require separate training for different objects, leading to computational inefficiencies. To address these issues, we propose Crucial-Diff, a domain-agnostic framework designed to synthesize crucial samples. Our method integrates two key modules. The Scene Agnostic Feature Extractor (SAFE) utilizes a unified feature extractor to capture target information. The Weakness Aware Sample Miner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques