Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song, Weixing Chen, Yang Liu, Weikai Chen, Guanbin Li, Liang, Lin

TL;DR
This paper introduces a new long-horizon vision-language navigation task, along with a data platform, benchmark, evaluation metrics, and a novel model to advance multi-stage, complex environment navigation research.
Contribution
It presents the LH-VLN task, a comprehensive dataset, new evaluation metrics, and a multi-granularity memory module, addressing limitations of existing VLN methods for long-term planning.
Findings
Developed the LHPR-VLN benchmark with 3,260 complex tasks.
Proposed new metrics for detailed task success evaluation.
Introduced the MGDM module for improved navigation in dynamic environments.
Abstract
Existing Vision-Language Navigation (VLN) methods primarily focus on single-stage navigation, limiting their effectiveness in multi-stage and long-horizon tasks within complex and dynamic environments. To address these limitations, we propose a novel VLN task, named Long-Horizon Vision-Language Navigation (LH-VLN), which emphasizes long-term planning and decision consistency across consecutive subtasks. Furthermore, to support LH-VLN, we develop an automated data generation platform NavGen, which constructs datasets with complex task structures and improves data utility through a bidirectional, multi-granularity generation approach. To accurately evaluate complex tasks, we construct the Long-Horizon Planning and Reasoning in VLN (LHPR-VLN) benchmark consisting of 3,260 tasks with an average of 150 task steps, serving as the first dataset specifically designed for the long-horizon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Speech and dialogue systems · Semantic Web and Ontologies
MethodsFocus
