Reinforced Structured State-Evolution for Vision-Language Navigation

Jinyu Chen; Chen Gao; Erli Meng; Qiong Zhang; Si Liu

arXiv:2204.09280·cs.CV·May 27, 2022

Reinforced Structured State-Evolution for Vision-Language Navigation

Jinyu Chen, Chen Gao, Erli Meng, Qiong Zhang, Si Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel graph-based structured state model for vision-language navigation, enhancing the agent’s ability to utilize environment layout clues and improving navigation accuracy significantly.

Contribution

It proposes a structured state-evolution model with reinforcement learning to better maintain environment layout information during navigation.

Findings

01

Improves SPL accuracy by +3% on R2R dataset.

02

Enhances long-term navigation performance.

03

Utilizes graph-based environment representations.

Abstract

Vision-and-language Navigation (VLN) task requires an embodied agent to navigate to a remote location following a natural language instruction. Previous methods usually adopt a sequence model (e.g., Transformer and LSTM) as the navigator. In such a paradigm, the sequence model predicts action at each step through a maintained navigation state, which is generally represented as a one-dimensional vector. However, the crucial navigation clues (i.e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured. In this paper, we propose a novel Structured state-Evolution (SEvol) model to effectively maintain the environment layout clues for VLN. Specifically, we utilise the graph-based feature to represent the navigation state instead of the vector-based state. Accordingly, we devise a Reinforced Layout clues Miner (RLM)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chenjinyubuaa/sevol
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Adam · Multi-Head Attention · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections