Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira,, Richard Socher, Caiming Xiong

TL;DR
This paper presents a self-monitoring navigation agent that uses auxiliary progress estimation and visual-textual grounding to improve performance in vision-and-language navigation tasks, achieving state-of-the-art results.
Contribution
Introduces a novel self-monitoring agent with visual-textual co-grounding and progress estimation modules for VLN tasks, significantly improving success rates.
Findings
Achieves an 8% absolute success rate increase on unseen test sets.
Demonstrates the effectiveness of progress monitoring in navigation.
Provides ablation studies confirming component contributions.
Abstract
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal. In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. We test our self-monitoring agent on a standard benchmark and analyze our proposed approach through a series of ablation studies that elucidate the contributions of the primary components. Using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Optimization and Search Problems · Robotics and Sensor-Based Localization
