StratXplore: Strategic Novelty-seeking and Instruction-aligned   Exploration for Vision and Language Navigation

Muraleekrishna Gopinathan; Jumana Abu-Khalaf; David Suter; Martin; Masek

arXiv:2409.05593·cs.RO·September 10, 2024

StratXplore: Strategic Novelty-seeking and Instruction-aligned Exploration for Vision and Language Navigation

Muraleekrishna Gopinathan, Jumana Abu-Khalaf, David Suter, Martin, Masek

PDF

Open Access

TL;DR

StratXplore introduces a memory-based, mistake-aware exploration strategy for vision-language navigation, enabling robots to better recover from errors by selecting optimal unexplored viewpoints aligned with instructions.

Contribution

The paper proposes a novel exploration method that leverages memory and mistake-awareness to improve navigation success in VLN tasks.

Findings

01

Improved success rates on VLN datasets.

02

Effective recovery from navigational mistakes.

03

Enhanced decision-making with exploration of unexplored frontiers.

Abstract

Embodied navigation requires robots to understand and interact with the environment based on given tasks. Vision-Language Navigation (VLN) is an embodied navigation task, where a robot navigates within a previously seen and unseen environment, based on linguistic instruction and visual inputs. VLN agents need access to both local and global action spaces; former for immediate decision making and the latter for recovering from navigational mistakes. Prior VLN agents rely only on instruction-viewpoint alignment for local and global decision making and back-track to a previously visited viewpoint, if the instruction and its current viewpoint mismatches. These methods are prone to mistakes, due to the complexity of the instruction and partial observability of the environment. We posit that, back-tracking is sub-optimal and agent that is aware of its mistakes can recover efficiently. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications