Curriculum Learning for Vision-and-Language Navigation
Jiwen Zhang, Zhongyu Wei, Jianqing Fan, Jiajie Peng

TL;DR
This paper introduces a curriculum learning approach for Vision-and-Language Navigation that improves agent performance and training efficiency by systematically ordering training samples based on difficulty.
Contribution
We propose a novel curriculum-based training paradigm for VLN that re-arranges datasets to better match human prior knowledge and learning progress.
Findings
Significant performance improvements on R2R benchmark
Enhanced generalizability of navigation agents
Increased training efficiency without added model complexity
Abstract
Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance. To tackle this issue, we propose a novel curriculum-based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress about training samples. We develop the principle of curriculum design and re-arrange the benchmark Room-to-Room (R2R) dataset to make it suitable for curriculum training. Experiments show that our method is model-agnostic and can significantly improve the performance, the generalizability, and the training efficiency of current state-of-the-art navigation agents without increasing model complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
