World-Consistent Data Generation for Vision-and-Language Navigation
Yu Zhong, Rui Zhang, Zihao Zhang, Shuo Wang, Chuan Fang, Xishan Zhang, Jiaming Guo, Shaohui Peng, Di Huang, Yanyang Yan, Xing Hu, Qi Guo

TL;DR
This paper introduces WCGEN, a novel data augmentation framework for Vision-and-Language Navigation that generates diverse, world-consistent training data to improve agent generalization in unseen environments.
Contribution
The paper presents a two-stage data generation method that ensures spatial and wraparound consistency using 3D knowledge, enhancing VLN model performance.
Findings
Achieves state-of-the-art results on multiple VLN datasets.
Significantly improves generalization to unseen environments.
Demonstrates the effectiveness of world-consistent data augmentation.
Abstract
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions. One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments. Though data argumentation is a promising way for scaling up the dataset, how to generate VLN data both diverse and world-consistent remains problematic. To cope with this issue, we propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency, aimed at enhancing the generalization of agents to novel environments. Roughly, our framework consists of two stages, the trajectory stage which leverages a point-cloud based technique to ensure spatial coherency among viewpoints, and the viewpoint stage which adopts a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
