Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology
Yatai Ji, Zhengqiu Zhu, Yong Zhao, Beidan Liu, Chen Gao, Yihao Zhao, Sihang Qiu, Yue Hu, Quanjun Yin, Yong Li

TL;DR
This paper introduces CityAVOS, a new benchmark dataset for urban UAV object search, and proposes PRPSearcher, an agentic method using multimodal large language models to improve autonomous search performance.
Contribution
The paper presents the first urban object search benchmark and a novel multi-modal LLM-based agentic approach for UAVs, enhancing search efficiency and success rates.
Findings
PRPSearcher outperforms existing baselines in success rate and efficiency.
The dataset enables comprehensive evaluation of UAV search capabilities.
Experimental results highlight the need for improved semantic reasoning.
Abstract
Aerial Visual Object Search (AVOS) tasks in urban environments require Unmanned Aerial Vehicles (UAVs) to autonomously search for and identify target objects using visual and textual cues without external guidance. Existing approaches struggle in complex urban environments due to redundant semantic processing, similar object distinction, and the exploration-exploitation dilemma. To bridge this gap and support the AVOS task, we introduce CityAVOS, the first benchmark dataset for autonomous search of common urban objects. This dataset comprises 2,420 tasks across six object categories with varying difficulty levels, enabling comprehensive evaluation of UAV agents' search capabilities. To solve the AVOS tasks, we also propose PRPSearcher (Perception-Reasoning-Planning Searcher), a novel agentic method powered by multi-modal large language models (MLLMs) that mimics human three-tier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobotic Path Planning Algorithms · Robotics and Sensor-Based Localization
MethodsSemi-Pseudo-Label
