Scaling Data Generation in Vision-and-Language Navigation

Zun Wang; Jialu Li; Yicong Hong; Yi Wang; Qi Wu; Mohit Bansal; Stephen; Gould; Hao Tan; Yu Qiao

arXiv:2307.15644·cs.CV·August 11, 2023·2 cites

Scaling Data Generation in Vision-and-Language Navigation

Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen, Gould, Hao Tan, Yu Qiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale data generation method for vision-and-language navigation, significantly improving agent performance and generalization across multiple datasets by synthesizing millions of instruction-trajectory pairs from diverse environments.

Contribution

The authors propose a novel data augmentation paradigm that leverages web resources and photorealistic datasets to create extensive training data, enhancing navigation agent performance and generalization.

Findings

01

Achieved +11% success rate improvement on R2R test split.

02

Reduced generalization gap to less than 1%.

03

Set new state-of-the-art results on multiple navigation benchmarks.

Abstract

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wz0919/scalevln
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques