Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Hanxuan Chen; Jie Zheng; Siqi Yang; Tianle Zeng; Siwei Feng; Songsheng Cheng; Ruilong Ren; Hanzhong Guo; Shuai Yuan; Xiangyue Wang; Kangli Wang; and Ji Pei

arXiv:2604.13654·cs.RO·April 16, 2026

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Hanxuan Chen, Jie Zheng, Siqi Yang, Tianle Zeng, Siwei Feng, Songsheng Cheng, Ruilong Ren, Hanzhong Guo, Shuai Yuan, Xiangyue Wang, Kangli Wang, and Ji Pei

PDF

TL;DR

This survey reviews progress and challenges in vision-and-language navigation for UAVs, highlighting technological evolution, key resources, and future research directions in complex 3D environments.

Contribution

It provides a structured taxonomy of UAV-VLN methods, analyzes current challenges, and proposes a comprehensive research roadmap for future advancements.

Findings

01

Evolution from modular to foundation model-based approaches

02

Identification of key challenges like sim-to-real gap and perception robustness

03

Proposal of future research directions including multi-agent coordination

Abstract

Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human commands and execute long-horizon tasks in complex 3D environments. This paper provides a comprehensive and structured survey of the field, from its formal task definition to the current state of the art. We establish a methodological taxonomy that charts the technological evolution from early modular and deep learning approaches to contemporary agentic systems driven by large foundation models, including Vision-Language Models (VLMs), Vision-Language-Action (VLA) models, and the emerging integration of generative world models with VLA architectures for physically-grounded reasoning. The survey systematically reviews the ecosystem of essential resources simulators, datasets, and evaluation metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.