Grounded Vision-Language Navigation for UAVs with Open-Vocabulary Goal Understanding
Yuhang Zhang, Haosheng Yu, Jiaping Xiao, and Mir Feroskhan

TL;DR
This paper introduces VLFly, a novel UAV navigation framework that uses vision-language models and large language models to understand open-vocabulary instructions and navigate complex environments without localization or active ranging.
Contribution
VLFly is the first UAV navigation system that integrates large language models and vision-language retrieval for open-vocabulary goal understanding and generalized navigation without localization.
Findings
Outperforms all baselines in diverse simulation environments.
Demonstrates robust real-world navigation with abstract language instructions.
Operates without localization or active ranging sensors.
Abstract
Vision-and-language navigation (VLN) is a long-standing challenge in autonomous robotics, aiming to empower agents with the ability to follow human instructions while navigating complex environments. Two key bottlenecks remain in this field: generalization to out-of-distribution environments and reliance on fixed discrete action spaces. To address these challenges, we propose Vision-Language Fly (VLFly), a framework tailored for Unmanned Aerial Vehicles (UAVs) to execute language-guided flight. Without the requirement for localization or active ranging sensors, VLFly outputs continuous velocity commands purely from egocentric observations captured by an onboard monocular camera. The VLFly integrates three modules: an instruction encoder based on a large language model (LLM) that reformulates high-level language into structured prompts, a goal retriever powered by a vision-language model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Robotics and Sensor-Based Localization
