Visual Perception Generalization for Vision-and-Language Navigation via   Meta-Learning

Ting Wang; Zongkai Wu; Donglin Wang

arXiv:2012.05446·cs.RO·January 20, 2021·1 cites

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning

Ting Wang, Zongkai Wu, Donglin Wang

PDF

Open Access

TL;DR

This paper introduces a meta-learning approach to enable vision-and-language navigation agents to quickly adapt to different camera configurations, improving transferability across diverse real-world robotic platforms.

Contribution

It proposes a novel visual perception generalization strategy using meta-learning, comparing MAML and metric-based methods for better adaptation in varied environments.

Findings

01

Meta-learning improves camera configuration transferability.

02

MAML excels in unseen environments.

03

Metric-based method performs well in seen environments.

Abstract

Vision-and-language navigation (VLN) is a challenging task that requires an agent to navigate in real-world environments by understanding natural language instructions and visual information received in real-time. Prior works have implemented VLN tasks on continuous environments or physical robots, all of which use a fixed camera configuration due to the limitations of datasets, such as 1.5 meters height, 90 degrees horizontal field of view (HFOV), etc. However, real-life robots with different purposes have multiple camera configurations, and the huge gap in visual information makes it difficult to directly transfer the learned navigation model between various robots. In this paper, we propose a visual perception generalization strategy based on meta-learning, which enables the agent to fast adapt to a new camera configuration with a few shots. In the training phase, we first locate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques