Visual-Language Navigation Pretraining via Prompt-based Environmental   Self-exploration

Xiwen Liang; Fengda Zhu; Lingling Li; Hang Xu; Xiaodan Liang

arXiv:2203.04006·cs.CV·March 9, 2022

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration

Xiwen Liang, Fengda Zhu, Lingling Li, Hang Xu, Xiaodan Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces ProbES, a prompt-based self-exploration method that leverages a large-scale cross-modal pretrained model to generate training data and adapt quickly to new environments in vision-language navigation tasks.

Contribution

The paper proposes a novel prompt-based self-exploration approach that eliminates the need for human-labeled data and enhances cross-domain adaptation in VLN tasks.

Findings

01

ProbES improves generalization in unseen environments.

02

It enables automatic environment exploration and instruction generation.

03

The method enhances adaptation speed and efficiency.

Abstract

Vision-language navigation (VLN) is a challenging task due to its large searching space in the environment. To address this problem, previous works have proposed some methods of fine-tuning a large model that pretrained on large-scale datasets. However, the conventional fine-tuning methods require extra human-labeled navigation data and lack self-exploration capabilities in environments, which hinders their generalization of unseen scenes. To improve the ability of fast cross-domain adaptation, we propose Prompt-based Environmental Self-exploration (ProbES), which can self-explore the environments by sampling trajectories and automatically generates structured instructions via a large-scale cross-modal pretrained model (CLIP). Our method fully utilizes the knowledge learned from CLIP to build an in-domain dataset by self-exploration without human labeling. Unlike the conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liangcici/probes-vln
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training