CLEAR: Improving Vision-Language Navigation with Cross-Lingual,   Environment-Agnostic Representations

Jialu Li; Hao Tan; Mohit Bansal

arXiv:2207.02185·cs.CV·July 6, 2022

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

Jialu Li, Hao Tan, Mohit Bansal

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLEAR, a method that enhances vision-language navigation by developing cross-lingual, environment-agnostic representations, enabling better generalization across languages and unseen environments.

Contribution

The paper proposes a novel approach to learn shared cross-lingual and environment-agnostic visual representations for VLN tasks, improving generalization and transferability.

Findings

01

Significant performance improvements on Room-Across-Room dataset.

02

Effective transfer of learned representations to other VLN tasks.

03

Enhanced generalization to unseen environments and languages.

Abstract

Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions. In this paper, we aim to solve two key challenges in this task: utilizing multilingual instructions for improved instruction-path grounding and navigating through new environments that are unseen during training. To address these challenges, we propose CLEAR: Cross-Lingual and Environment-Agnostic Representations. First, our agent learns a shared and visually-aligned cross-lingual language representation for the three languages (English, Hindi and Telugu) in the Room-Across-Room dataset. Our language representation learning is guided by text pairs that are aligned by visual information. Second, our agent learns an environment-agnostic visual representation by maximizing the similarity between semantically-aligned image pairs (with constraints on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jialuli-luka/clear
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques