P$^{3}$Nav: End-to-End Perception, Prediction and Planning for Vision-and-Language Navigation
Tianfu Li, Wenbo Chen, Haoxuan Xu, Xinhu Zheng, Haoang Li

TL;DR
P$^{3}$Nav introduces an end-to-end framework that combines perception, prediction, and planning to enhance scene understanding and navigation success in vision-and-language tasks.
Contribution
It is the first to unify perception, prediction, and planning in a single pipeline for VLN, improving scene understanding and navigation performance.
Findings
Achieves state-of-the-art results on REVERIE, R2R-CE, and RxR-CE benchmarks.
Effectively predicts waypoints and semantic map cues to aid navigation.
Enhances scene understanding by integrating object-level and map-level perceptual cues.
Abstract
In Vision-and-Language Navigation (VLN), an agent is required to plan a path to the target specified by the language instruction, using its visual observations. Consequently, prevailing VLN methods primarily focus on building powerful planners through visual-textual alignment. However, these approaches often bypass the imperative of comprehensive scene understanding prior to planning, leaving the agent with insufficient perception or prediction capabilities. Thus, we propose PNav, a novel end-to-end framework integrating perception, prediction, and planning in a unified pipeline to strengthen the VLN agent's scene understanding and boost navigation success. Specifically, PNav augments perception by extracting complementary cues from object-level and map-level perspectives. Subsequently, our PNav predicts waypoints to model the agent's potential future states, endowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotic Path Planning Algorithms · AI-based Problem Solving and Planning
