VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language   Navigation

Yanyuan Qiao; Zheng Yu; Qi Wu

arXiv:2308.10172·cs.CV·August 22, 2023

VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

Yanyuan Qiao, Zheng Yu, Qi Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces VLN-PETL, a parameter-efficient transfer learning approach tailored for vision-and-language navigation tasks, achieving comparable or superior results to full fine-tuning while significantly reducing training costs.

Contribution

The paper presents the first VLN-specific PETL method with novel modules, improving efficiency and performance in VLN tasks compared to existing PETL techniques.

Findings

01

VLN-PETL achieves comparable or better performance than full fine-tuning.

02

VLN-PETL outperforms existing PETL methods on four VLN benchmarks.

03

The proposed modules enhance cross-modal and historical interaction in VLN models.

Abstract

The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models. However, full fine-tuning the pre-trained model for every downstream VLN task is becoming costly due to the considerable model size. Recent research hotspot of Parameter-Efficient Transfer Learning (PETL) shows great potential in efficiently tuning large pre-trained models for the common CV and NLP tasks, which exploits the most of the representation knowledge implied in the pre-trained model while only tunes a minimal set of parameters. However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yanyuanqiao/vln-petl
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques