Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Xiangyu Wang, Donglin Yang, Ziqin Wang, Hohin Kwan, Jinyu Chen, Wenjun, Wu, Hongsheng Li, Yue Liao, Si Liu

TL;DR
This paper introduces a comprehensive framework for UAV vision-language navigation, including a new platform, dataset, benchmark, and a multimodal large language model, addressing the unique challenges of aerial navigation tasks.
Contribution
It presents the OpenUAV platform, a UAV-specific VLN dataset, a guidance-assisted benchmark, and a novel UAV navigation LLM, advancing realistic UAV navigation research.
Findings
Our method outperforms baseline models significantly.
There is a large gap between current models and human performance.
The UAV-Need-Help benchmark effectively evaluates guidance-based navigation.
Abstract
Developing agents capable of navigating to a target location based on language instructions and visual information, known as vision-language navigation (VLN), has attracted widespread interest. Most research has focused on ground-based agents, while UAV-based VLN remains relatively underexplored. Recent efforts in UAV vision-language navigation predominantly adopt ground-based VLN settings, relying on predefined discrete action spaces and neglecting the inherent disparities in agent movement dynamics and the complexity of navigation tasks between ground and aerial environments. To address these disparities and challenges, we propose solutions from three perspectives: platform, benchmark, and methodology. To enable realistic UAV trajectory simulation in VLN tasks, we propose the OpenUAV platform, which features diverse environments, realistic flight control, and extensive algorithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Robotic Path Planning Algorithms
