Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

Wenhao Li; Zimeng Wu; Yu Wu; Zehua Fu; Jiaxin Chen

arXiv:2604.02966·cs.CV·April 6, 2026

Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

Wenhao Li, Zimeng Wu, Yu Wu, Zehua Fu, Jiaxin Chen

PDF

1 Repo

TL;DR

UAVGen introduces a novel diffusion-based image generation framework that enhances UAV-based object detection by producing high-fidelity, focused synthetic images, significantly improving detection accuracy in challenging scenarios.

Contribution

The paper presents UAVGen, a new layout-to-image generation framework with a visual prototype conditioned diffusion model and focal region data pipeline for improved UAV object detection.

Findings

01

Outperforms state-of-the-art image synthesis methods in UAV detection tasks.

02

Enhances detection accuracy across various detectors when integrated.

03

Produces high-fidelity, focused synthetic images that improve model training.

Abstract

Unmanned aerial vehicle (UAV) based object detection is a critical but challenging task, when applied in dynamically changing scenarios with limited annotated training data. Layout-to-image generation approaches have proved effective in promoting detection accuracy by synthesizing labeled images based on diffusion models. However, they suffer from frequently producing artifacts, especially near layout boundaries of tiny objects, thus substantially limiting their performance. To address these issues, we propose UAVGen, a novel layout-to-image generation framework tailored for UAV-based object detection. Specifically, UAVGen designs a Visual Prototype Conditioned Diffusion Model (VPC-DM) that constructs representative instances for each class and integrates them into latent embeddings for high-fidelity object generation. Moreover, a Focal Region Enhanced Data Pipeline (FRE-DP) is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sirius-Li/UAVGen
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.