Unbiased Object Detection Beyond Frequency with Visually Prompted Image Synthesis
Xinhao Cai, Liulei Li, Gensheng Pei, Tao Chen, Jinshan Pan, Yazhou Yao, Wenguan Wang

TL;DR
This paper introduces a generation-based debiasing framework for object detection that uses a novel representation score and visual blueprints to improve detection of rare objects and generate high-quality, complex scenes.
Contribution
It proposes a new debiasing method that guides data generation beyond frequency, using representation scores and visual blueprints for better scene synthesis and detection performance.
Findings
Improves detection of rare objects by 4.4/3.6 mAP.
Surpasses prior models by 15.9 mAP in layout accuracy.
Significantly narrows performance gap for underrepresented object groups.
Abstract
This paper presents a generation-based debiasing framework for object detection. Prior debiasing methods are often limited by the representation diversity of samples, while naive generative augmentation often preserves the biases it aims to solve. Moreover, our analysis reveals that simply generating more data for rare classes is suboptimal due to two core issues: i) instance frequency is an incomplete proxy for the true data needs of a model, and ii) current layout-to-image synthesis lacks the fidelity and control to generate high-quality, complex scenes. To overcome this, we introduce the representation score (RS) to diagnose representational gaps beyond mere frequency, guiding the creation of new, unbiased layouts. To ensure high-quality synthesis, we replace ambiguous text prompts with a precise visual blueprint and employ a generative alignment strategy, which fosters communication…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper introduces a novel approach to diagnosing and addressing dataset biases by integrating representation scores with visual blueprints and generative alignment. This method is innovative as it overcomes limitations found in existing techniques. 2. The research is characterized by rigorous experiments and comprehensive analysis. The proposed methods are well-implemented, validated, and demonstrate excellent performance improvements. 3. The contributions are significant as they tackle a
I appreciate the great efforts for this paper with clear motivation, thoughtful analyses, detailed strategies and comprehensive evaluations. Even so, after carefully considering the contributions of this work, I have some main concerns on the insights the paper conveys, besides some concerns on details in the paper. 1. *The insights* - a) The study for the motivation in Section is too empirical and heuristic. Those observation and analyses mainly focus on the performance comparison. The perfor
1. The performance is good. The improvement is substantial compared to previous methods, such as GeoDiffusion. 2 . This method is simple yet effective. The two main contributions, the representation score and visual blueprint, are easy to understand and significantly improve the performance. 3. The motivation is clear. The observations in Sec. 2 explain why frequency is not enough and fidelity is important.
1. Lack a figure to describe the overall pipeline in detail. Figures 2 and 3 are used to explain the layout recalibration and blueprint construction; however, there is no figure to explain the complete pipeline, which can be confusing. 2. The effect of Generative Alignment is negligible. Can you explain its necessity? 3. There is no connection between the two contributions in this paper, which makes the paper an incremental industry-focused work rather than a cohesive academic study.
- The problem setting is fairly clear, including enhancements for both layout and image generation. - The preliminary exploration of layout frequency is extensive. - The authors conduct experiments on both fidelity and trainability on both COCO and NuImages.
- About Representation Score: - In line 161, considering that the box size s and horizontal position u are both continuous, do you conduct any quantization to construct the RS group? - In Sec. 2, the authors conduct extensive experiments to demonstrate that frequency is not the best metric for layout selection, which, however, cannot directly connect with the complicated definition of RS in Equ. 1. - Considering `Freq-Aware Gen` is still a solid baseline, I would expect to see a comparison
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Neural Network Applications
