ESceme: Vision-and-Language Navigation with Episodic Scene Memory

Qi Zheng; Daqing Liu; Chaoyue Wang; Jing Zhang; Dadong Wang; Dacheng; Tao

arXiv:2303.01032·cs.CV·July 16, 2024·1 cites

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, Dacheng, Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces ESceme, a novel episodic scene memory mechanism for vision-and-language navigation that improves an agent's ability to utilize past visited scenes, enhancing navigation performance across multiple benchmarks.

Contribution

We propose ESceme, a simple yet effective episodic memory system that enhances VLN agents by enabling dynamic recall of past scenes, leading to better navigation accuracy.

Findings

01

ESceme outperforms existing methods on R2R, R4R, and CVDN datasets.

02

The approach achieves first place on the CVDN leaderboard.

03

Enhanced scene memory improves navigation in both short- and long-horizon tasks.

Abstract

Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes. Existing approaches have made enormous progress in navigation in new environments, such as beam search, pre-exploration, and dynamic or hierarchical history encoding. To balance generalization and efficiency, we resort to memorizing visited scenarios apart from the ongoing route while navigating. In this work, we introduce a mechanism of Episodic Scene memory (ESceme) for VLN that wakes an agent's memories of past visits when it enters the current scene. The episodic scene memory allows the agent to envision a bigger picture of the next prediction. This way, the agent learns to utilize dynamically updated information instead of merely adapting to the current observations. We provide a simple yet effective implementation of ESceme by enhancing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qizhust/esceme
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning