Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts
Mingning Guo, Mengwei Wu, Shaoxian Li, Haifeng Li, Chao Tao

TL;DR
This paper introduces AerialVP, a framework that enhances task prompts for UAV image perception by extracting auxiliary information, significantly improving model performance across various perception tasks and conditions.
Contribution
AerialVP is the first agent framework for task prompt enhancement in UAV perception, addressing limitations of traditional VLMs with a multi-stage prompt improvement process.
Findings
AerialVP improves perception accuracy across multiple UAV tasks.
Enhanced prompts lead to more stable and substantial performance gains.
The framework is effective on both open-source and proprietary models.
Abstract
Existing image perception methods based on VLMs generally follow a paradigm wherein models extract and analyze image content based on user-provided textual task prompts. However, such methods face limitations when applied to UAV imagery, which presents challenges like target confusion, scale variations, and complex backgrounds. These challenges arise because VLMs' understanding of image content depends on the semantic alignment between visual and textual tokens. When the task prompt is simplistic and the image content is complex, achieving effective alignment becomes difficult, limiting the model's ability to focus on task-relevant information. To address this issue, we introduce AerialVP, the first agent framework for task prompt enhancement in UAV image perception. AerialVP proactively extracts multi-dimensional auxiliary information from UAV images to enhance task prompts, overcoming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
