VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation
Yanyuan Qiao, Zheng Yu, Qi Wu

TL;DR
This paper introduces VLN-PETL, a parameter-efficient transfer learning approach tailored for vision-and-language navigation tasks, achieving comparable or superior results to full fine-tuning while significantly reducing training costs.
Contribution
The paper presents the first VLN-specific PETL method with novel modules, improving efficiency and performance in VLN tasks compared to existing PETL techniques.
Findings
VLN-PETL achieves comparable or better performance than full fine-tuning.
VLN-PETL outperforms existing PETL methods on four VLN benchmarks.
The proposed modules enhance cross-modal and historical interaction in VLN models.
Abstract
The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models. However, full fine-tuning the pre-trained model for every downstream VLN task is becoming costly due to the considerable model size. Recent research hotspot of Parameter-Efficient Transfer Learning (PETL) shows great potential in efficiently tuning large pre-trained models for the common CV and NLP tasks, which exploits the most of the representation knowledge implied in the pre-trained model while only tunes a minimal set of parameters. However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
