Vision-Language Navigation with Random Environmental Mixup

Chong Liu; Fengda Zhu; Xiaojun Chang; Xiaodan Liang and; Zongyuan Ge; Yi-Dong Shen

arXiv:2106.07876·cs.CV·November 2, 2021

Vision-Language Navigation with Random Environmental Mixup

Chong Liu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang and, Zongyuan Ge, Yi-Dong Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Random Environmental Mixup (REM), a novel data augmentation technique for Vision-Language Navigation that creates cross-connected house scenes to improve agent generalization to unseen environments.

Contribution

The paper proposes REM, a new data augmentation method that explicitly reduces data bias across different house scenes in VLN tasks, enhancing generalization to unseen environments.

Findings

01

REM improves navigation performance in unseen scenes.

02

The approach reduces the performance gap between seen and unseen environments.

03

Our model achieves state-of-the-art results on VLN benchmarks.

Abstract

Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction. Large data bias, which is caused by the disparity ratio between the small data scale and large navigation space, makes the VLN task challenging. Previous works have proposed various data augmentation methods to reduce data bias. However, these works do not explicitly reduce the data bias across different house scenes. Therefore, the agent would overfit to the seen scenes and achieve poor navigation performance in the unseen scenes. To tackle this problem, we propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment. Specifically, we first select key viewpoints according to the room connection graph for each scene. Then, we cross-connect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lcfractal/vlnrem
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsConvolution · Dense Connections · Q-Learning · Deep Q-Network · Random Ensemble Mixture · Mixup