MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and   Reasoning Chains

Zhaohuan Zhan; Lisha Yu; Sijie Yu; Guang Tan

arXiv:2405.10620·cs.AI·August 13, 2024

MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains

Zhaohuan Zhan, Lisha Yu, Sijie Yu, Guang Tan

PDF

Open Access

TL;DR

This paper introduces MC-GPT, a novel approach that combines memory maps and reasoning chains to improve vision-and-language navigation, making it more effective and interpretable by leveraging LLMs and navigation history.

Contribution

The paper proposes a topological memory map and a navigation chain of thoughts module to enhance navigation strategies and interpretability in VLN tasks using LLMs.

Findings

01

Improved navigation accuracy on REVERIE and R2R datasets.

02

Enhanced interpretability of navigation reasoning.

03

Effective integration of memory and strategy modules.

Abstract

In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high training costs and lack of interpretability. Recently, Large Language Models (LLMs) have emerged as a promising tool for VLN due to their strong generalization capabilities. However, existing LLM-based methods face limitations in memory construction and diversity of navigation strategies. To address these challenges, we propose a suite of techniques. Firstly, we introduce a method to maintain a topological map that stores navigation history, retaining information about viewpoints, objects, and their spatial relationships. This map also serves as a global action space. Additionally, we present a Navigation Chain of Thoughts module, leveraging human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning