Structured Scene Memory for Vision-Language Navigation

Hanqing Wang; Wenguan Wang; Wei Liang; Caiming Xiong; Jianbing Shen

arXiv:2103.03454·cs.CV·March 8, 2021

Structured Scene Memory for Vision-Language Navigation

Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen

PDF

1 Repo

TL;DR

This paper introduces Structured Scene Memory (SSM), a novel architecture for vision-language navigation that enhances long-term planning and environment understanding, leading to state-of-the-art results on VLN benchmarks.

Contribution

The paper proposes SSM, a structured, compartmentalized memory system that captures environment layouts and supports global planning in VLN tasks.

Findings

01

Achieves state-of-the-art performance on R2R and R4R datasets.

02

Effectively captures environment layouts and disentangles visual and geometric cues.

03

Supports efficient, global navigation planning through frontier-exploration strategy.

Abstract

Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i.e., entailing an agent to navigate 3D environments through following linguistic instructions. However, current VLN agents simply store their past experiences/observations as latent states in recurrent networks, failing to capture environment layouts and make long-term planning. To address these limitations, we propose a crucial architecture, called Structured Scene Memory (SSM). It is compartmentalized enough to accurately memorize the percepts during navigation. It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment. SSM has a collect-read controller that adaptively collects information for supporting current decision making and mimics iterative algorithms for long-range reasoning. As SSM provides a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HanqingWangAI/SSM-VLN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.