AirNav: A Large-Scale UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions

Hengxing Cai; Yijie Rao; Ligang Huang; Zanyang Zhong; Jinhan Dong; Jingjun Tan; Changhao Nai; Jue Hou; Wenhao Lu; Renxin Zhong

arXiv:2601.03707·cs.CL·May 18, 2026

AirNav: A Large-Scale UAV Vision-and-Language Navigation Dataset with Natural and Diverse Instructions

Hengxing Cai, Yijie Rao, Ligang Huang, Zanyang Zhong, Jinhan Dong, Jingjun Tan, Changhao Nai, Jue Hou, Wenhao Lu, Renxin Zhong

PDF

1 Repo

TL;DR

AirNav is a large-scale UAV navigation dataset with natural instructions, enabling realistic training and evaluation of UAV vision-and-language models, and demonstrating state-of-the-art performance and transferability.

Contribution

The paper introduces AirNav, a comprehensive UAV navigation dataset with natural instructions, and proposes AirVLN-R1, a model achieving state-of-the-art results with real-world transferability.

Findings

01

AirVLN-R1 achieves 51.82% success rate on test-unseen split.

02

The dataset includes 137K navigation samples with natural instructions.

03

Real-world UAV experiments suggest promising sim-to-real transfer.

Abstract

Existing UAV vision-and-language navigation (VLN) benchmarks rarely provide realistic aerial scenes, natural process-level instructions, and sufficient scale simultaneously, making it difficult to systematically train and evaluate UAV VLN agents under realistic settings. To address this, we propose \textbf{AirNav}, a large-scale benchmark built on real urban aerial data, comprising 137K navigation samples with natural and diverse instructions generated via a human--LLM collaborative pipeline with 10 user personas. We conduct a systematic evaluation of representative approaches on AirNav, ranging from traditional models to multimodal large language models (MLLMs), under unified metrics with open-source implementations. We further propose \textbf{AirVLN-R1}, trained via supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT), achieving state-of-the-art performance with a 51.82\%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning