A New Path: Scaling Vision-and-Language Navigation with Synthetic   Instructions and Imitation Learning

Aishwarya Kamath; Peter Anderson; Su Wang; Jing Yu Koh; Alexander Ku,; Austin Waters; Yinfei Yang; Jason Baldridge; Zarana Parekh

arXiv:2210.03112·cs.LG·April 18, 2023·1 cites

A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning

Aishwarya Kamath, Peter Anderson, Su Wang, Jing Yu Koh, Alexander Ku,, Austin Waters, Yinfei Yang, Jason Baldridge, Zarana Parekh

PDF

Open Access

TL;DR

This paper introduces a large-scale synthetic dataset of 4.2 million instruction-trajectory pairs for vision-and-language navigation, leveraging imitation learning and synthetic instruction generation to significantly improve agent performance.

Contribution

It presents a novel large-scale synthetic dataset and a simple transformer-based imitation learning approach that outperforms existing methods on VLN tasks.

Findings

01

Outperforms all existing RL agents on RxR dataset

02

Improves NDTW from 71.1 to 79.1 in seen environments

03

Enhances generalization with unseen environments

Abstract

Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-language navigation instructions in photorealistic environments, as a step towards robots that can follow human instructions. However, given the scarcity of human instruction data and limited diversity in the training environments, these agents still struggle with complex language grounding and spatial language understanding. Pretraining on large text and image-text datasets from the web has been extensively explored but the improvements are limited. We investigate large-scale augmentation with synthetic instructions. We take 500+ indoor environments captured in densely-sampled 360 degree panoramas, construct navigation trajectories through these panoramas, and generate a visually-grounded instruction for each trajectory using Marky, a high-quality multilingual navigation instruction generator. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsTest