CMMR-VLN: Vision-and-Language Navigation via Continual Multimodal Memory Retrieval
Haozhou Li, Xiangyu Dong, Huiyan Jiang, Yaoming Zhou, Xiaoguang Ma

TL;DR
CMMR-VLN enhances vision-and-language navigation by integrating structured multimodal memory and reflection mechanisms, significantly improving success rates in complex and unfamiliar environments compared to existing models.
Contribution
This work introduces a novel VLN framework with structured memory retrieval and reflection capabilities, enabling better recall and utilization of prior experiences during navigation.
Findings
Achieved up to 52.9% success rate improvement over NavGPT.
Demonstrated effective retrieval of relevant experiences during navigation.
Showed significant performance gains in both simulation and real-world tests.
Abstract
Although large language models (LLMs) are introduced into vision-and-language navigation (VLN) to improve instruction comprehension and generalization, existing LLM- based VLN lacks the ability to selectively recall and use relevant priori experiences to help navigation tasks, limiting their performance in long-horizon and unfamiliar scenarios. In this work, we propose CMMR-VLN (Continual Multimodal Memory Retrieval based VLN), a VLN framework that endows LLM agents with structured memory and reflection capabilities. Specifically, the CMMR-VLN constructs a multimodal experi- ence memory indexed by panoramic visual images and salient landmarks to retrieve relevant experiences during navigation, introduces a retrieved-augmented generation pipeline to mimick how experienced human navigators leverage priori knowledge, and incorporates a reflection-based memory update strategy that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques
