LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation

Bo Miao; Weijia Liu; Jun Luo; Lachlan Shinnick; Jian Liu; Thomas Hamilton-Smith; Yuhe Yang; Zijie Wu; Vanja Videnovic; Feras Dayoub; Anton van den Hengel

arXiv:2602.02220·cs.CV·February 3, 2026

LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation

Bo Miao, Weijia Liu, Jun Luo, Lachlan Shinnick, Jian Liu, Thomas Hamilton-Smith, Yuhe Yang, Zijie Wu, Vanja Videnovic, Feras Dayoub, Anton van den Hengel

PDF

Open Access

TL;DR

LangMap introduces a comprehensive benchmark for open-vocabulary goal navigation in 3D indoor environments, enabling evaluation of language understanding and navigation at multiple semantic levels.

Contribution

The paper presents LangMap, a large-scale, high-quality benchmark with diverse tasks and annotations for evaluating language-guided embodied navigation.

Findings

01

Rich context and memory improve navigation success.

02

Long-tailed, small, and distant goals remain challenging.

03

Richer descriptions outperform previous benchmarks.

Abstract

The relationships between objects and language are fundamental to meaningful communication between humans and AI, and to practically useful embodied intelligence. We introduce HieraNav, a multi-granularity, open-vocabulary goal navigation task where agents interpret natural language instructions to reach targets at four semantic levels: scene, room, region, and instance. To this end, we present Language as a Map (LangMap), a large-scale benchmark built on real-world 3D indoor scans with comprehensive human-verified annotations and tasks spanning these levels. LangMap provides region labels, discriminative region descriptions, discriminative instance descriptions covering 414 object categories, and over 18K navigation tasks. Each target features both concise and detailed descriptions, enabling evaluation across different instruction styles. LangMap achieves superior annotation quality,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Human Pose and Action Recognition