IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments
Xu Liu, Yu Liu, Hanshuo Qiu, Yang Qirong, Zhouhui Lian

TL;DR
IndoorUAV introduces a comprehensive benchmark and a novel UAV navigation model for vision-language tasks in complex indoor environments, addressing a gap in aerial navigation research.
Contribution
We created a large-scale indoor UAV navigation benchmark with diverse scenes, trajectories, and instructions, and proposed a new navigation model tailored for this setting.
Findings
Over 16,000 annotated trajectories for long-horizon VLN
A new UAV navigation model leveraging task decomposition
Benchmark facilitates research in indoor aerial vision-language navigation
Abstract
Vision-Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground-based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV-based VLN remains underexplored, despite its relevance to real-world applications such as inspection, delivery, and search-and-rescue in confined spaces. To bridge this gap, we introduce \textbf{IndoorUAV}, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · Social Robot Interaction and HRI
