Loc4Plan: Locating Before Planning for Outdoor Vision and Language   Navigation

Huilin Tian; Jingke Meng; Wei-Shi Zheng; Yuan-Ming Li and; Junkai Yan; Yunong Zhang

arXiv:2408.05090·cs.CV·August 12, 2024

Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation

Huilin Tian, Jingke Meng, Wei-Shi Zheng, Yuan-Ming Li and, Junkai Yan, Yunong Zhang

PDF

TL;DR

This paper introduces Loc4Plan, a novel framework for outdoor Vision and Language Navigation that emphasizes spatial localization before planning, significantly improving navigation accuracy by incorporating spatial perception.

Contribution

The work presents a new approach that integrates spatial localization prior to action planning, enhancing grounding and decision-making in outdoor VLN tasks.

Findings

01

Outperforms state-of-the-art methods on Touchdown and map2seq datasets.

02

Demonstrates the effectiveness of spatial localization in outdoor VLN.

03

Shows significant accuracy improvements in navigation tasks.

Abstract

Vision and Language Navigation (VLN) is a challenging task that requires agents to understand instructions and navigate to the destination in a visual environment.One of the key challenges in outdoor VLN is keeping track of which part of the instruction was completed. To alleviate this problem, previous works mainly focus on grounding the natural language to the visual input, but neglecting the crucial role of the agent's spatial position information in the grounding process. In this work, we first explore the substantial effect of spatial position locating on the grounding of outdoor VLN, drawing inspiration from human navigation. In real-world navigation scenarios, before planning a path to the destination, humans typically need to figure out their current location. This observation underscores the pivotal role of spatial localization in the navigation process. In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus