IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments

Xu Liu; Yu Liu; Hanshuo Qiu; Yang Qirong; Zhouhui Lian

arXiv:2512.19024·cs.RO·December 23, 2025

IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments

Xu Liu, Yu Liu, Hanshuo Qiu, Yang Qirong, Zhouhui Lian

PDF

Open Access 1 Video

TL;DR

IndoorUAV introduces a comprehensive benchmark and a novel UAV navigation model for vision-language tasks in complex indoor environments, addressing a gap in aerial navigation research.

Contribution

We created a large-scale indoor UAV navigation benchmark with diverse scenes, trajectories, and instructions, and proposed a new navigation model tailored for this setting.

Findings

01

Over 16,000 annotated trajectories for long-horizon VLN

02

A new UAV navigation model leveraging task decomposition

03

Benchmark facilitates research in indoor aerial vision-language navigation

Abstract

Vision-Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground-based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV-based VLN remains underexplored, despite its relevance to real-world applications such as inspection, delivery, and search-and-rescue in confined spaces. To bridge this gap, we introduce \textbf{IndoorUAV}, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Social Robot Interaction and HRI