MapGPT: Map-Guided Prompting with Adaptive Path Planning for   Vision-and-Language Navigation

Jiaqi Chen; Bingqian Lin; Ran Xu; Zhenhua Chai; Xiaodan Liang,; Kwan-Yee K. Wong

arXiv:2401.07314·cs.AI·June 21, 2024·5 cites

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang,, Kwan-Yee K. Wong

PDF

Open Access 1 Video

TL;DR

MapGPT introduces a novel map-guided prompting approach with adaptive path planning for vision-and-language navigation, significantly improving zero-shot performance and enabling global environment understanding in embodied agents.

Contribution

The paper presents a new map-guided GPT-based agent with online maps and adaptive planning, enhancing global exploration and path planning in VLN tasks.

Findings

01

Achieves state-of-the-art zero-shot performance on R2R and REVERIE (~10% and ~12% SR improvements)

02

Demonstrates emergent global thinking and path planning abilities in GPT-based agents

03

Applicable to both GPT-4 and GPT-4V models

Abstract

Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment. In this work, we present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage global exploration. Specifically, we build an online map and incorporate it into the prompts that include node information and topological relationships, to help GPT understand the spatial environment. Benefiting from this design, we further propose an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsLabel Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · GPT-4 · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dropout