GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection

Chen Min; Chengyang Li; Fanjie Kong; Qi Zhu; Dawei Zhao; Liang Xiao

arXiv:2601.07273·cs.CV·January 13, 2026

GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection

Chen Min, Chengyang Li, Fanjie Kong, Qi Zhu, Dawei Zhao, Liang Xiao

PDF

Open Access

TL;DR

GenDet introduces a novel object detection framework that formulates detection as an image generation task using a diffusion model, enabling precise bounding box and category prediction within a generative paradigm.

Contribution

It pioneers the use of large-scale pre-trained diffusion models for object detection, integrating semantic constraints for accurate bounding box generation in the image space.

Findings

01

Achieves competitive accuracy with traditional detectors

02

Provides flexible and controllable detection outputs

03

Bridges generative models with discriminative detection tasks

Abstract

This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by leveraging generative modeling: it conditions on the input image and directly generates bounding boxes with semantic annotations in the original image space. GenDet establishes a conditional generation architecture built upon the large-scale pre-trained Stable Diffusion model, formulating the detection task as semantic constraints within the latent space. It enables precise control over bounding box positions and category attributes, while preserving the flexibility of the generative model. This novel methodology effectively bridges the gap between generative models and discriminative tasks, providing a fresh perspective for constructing unified visual understanding systems. Systematic experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Multimodal Machine Learning Applications