Affordances-Oriented Planning using Foundation Models for Continuous   Vision-Language Navigation

Jiaqi Chen; Bingqian Lin; Xinmin Liu; Lin Ma; Xiaodan Liang; Kwan-Yee; K. Wong

arXiv:2407.05890·cs.RO·August 21, 2024

Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

Jiaqi Chen, Bingqian Lin, Xinmin Liu, Lin Ma, Xiaodan Liang, Kwan-Yee, K. Wong

PDF

Open Access 1 Video

TL;DR

This paper introduces AO-Planner, a novel zero-shot affordances-oriented planning method for continuous vision-language navigation, integrating foundation models for low-level control and high-level decision-making, achieving state-of-the-art results.

Contribution

The paper presents AO-Planner, combining foundation models for low-level motion planning and high-level reasoning in continuous VLN, bridging the gap between high-level task planning and low-level control.

Findings

01

Achieves 8.8% improvement on SPL in R2R-CE and RxR-CE datasets.

02

Can serve as a data annotator for pseudo-label generation.

03

Attains 47% success rate with a data-efficient predictor.

Abstract

LLM-based agents have demonstrated impressive zero-shot performance in vision-language navigation (VLN) task. However, existing LLM-based methods often focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in navigation scenarios. To bridge this gap, we propose AO-Planner, a novel Affordances-Oriented Planner for continuous VLN task. Our AO-Planner integrates various foundation models to achieve affordances-oriented low-level motion planning and high-level decision-making, both performed in a zero-shot setting. Specifically, we employ a Visual Affordances Prompting (VAP) approach, where the visible ground is segmented by SAM to provide navigational affordances, based on which the LLM selects potential candidate waypoints and plans low-level paths towards selected waypoints. We further propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation· underline

Taxonomy

TopicsRobotic Path Planning Algorithms · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications

MethodsFocus · Segment Anything Model