CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee; Taiki Miyanishi; Shuhei Kurita; Koya Sakamoto; Daichi Azuma; Yutaka Matsuo; Nakamasa Inoue

arXiv:2406.14240·cs.CV·August 5, 2025·1 cites

CityNav: A Large-Scale Dataset for Real-World Aerial Navigation

Jungdae Lee, Taiki Miyanishi, Shuhei Kurita, Koya Sakamoto, Daichi Azuma, Yutaka Matsuo, Nakamasa Inoue

PDF

Open Access 1 Repo

TL;DR

CityNav introduces a large-scale real-world dataset for aerial vision-and-language navigation, enabling research on agents interpreting geographic and visual cues in city environments.

Contribution

We present the first large-scale real-world aerial VLN dataset, CityNav, and a methodology for creating geographic semantic maps to enhance navigation performance.

Findings

01

Semantic maps significantly improve agent navigation accuracy.

02

AerialVLN models outperform baseline methods on CityNav.

03

CityNav covers 4.65 km² in Cambridge and Birmingham with 32,637 trajectories.

Abstract

Vision-and-language navigation (VLN) aims to develop agents capable of navigating in realistic environments. While recent cross-modal training approaches have significantly improved navigation performance in both indoor and outdoor scenarios, aerial navigation over real-world cities remains underexplored primarily due to limited datasets and the difficulty of integrating visual and geographic information. To fill this gap, we introduce CityNav, the first large-scale real-world dataset for aerial VLN. Our dataset consists of 32,637 human demonstration trajectories, each paired with a natural language description, covering 4.65 km $^{2}$ across two real cities: Cambridge and Birmingham. In contrast to existing datasets composed of synthetic scenes such as AerialVLN, our dataset presents a unique challenge because agents must interpret spatial relationships between real-world landmarks and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

water-cookie/citynav
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies