Contrastive Instruction-Trajectory Learning for Vision-Language Navigation
Xiwen Liang, Fengda Zhu, Yi Zhu, Bingqian Lin, Bing Wang, Xiaodan, Liang

TL;DR
This paper introduces a contrastive learning framework for vision-language navigation that improves the robustness and generalization of navigation agents by learning distinctive representations through both coarse- and fine-grained contrastive objectives.
Contribution
The paper proposes a novel Contrastive Instruction-Trajectory Learning (CITL) framework that incorporates multi-level contrastive objectives and a sample-reweighting mechanism to enhance VLN model performance.
Findings
Outperforms previous state-of-the-art on R2R, R4R, and RxR datasets.
Enhances model generalization to unseen environments.
Improves the discrimination of instruction-trajectory pairs.
Abstract
The vision-language navigation (VLN) task requires an agent to reach a target with the guidance of natural language instruction. Previous works learn to navigate step-by-step following an instruction. However, these works may fail to discriminate the similarities and discrepancies across instruction-trajectory pairs and ignore the temporal continuity of sub-instructions. These problems hinder agents from learning distinctive vision-and-language representations, harming the robustness and generalizability of the navigation policy. In this paper, we propose a Contrastive Instruction-Trajectory Learning (CITL) framework that explores invariance across similar data samples and variance across different ones to learn distinctive representations for robust navigation. Specifically, we propose: (1) a coarse-grained contrastive learning objective to enhance vision-and-language representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsContrastive Learning
