VL-Adapter: Parameter-Efficient Transfer Learning for   Vision-and-Language Tasks

Yi-Lin Sung; Jaemin Cho; Mohit Bansal

arXiv:2112.06825·cs.CV·March 25, 2022

VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks

Yi-Lin Sung, Jaemin Cho, Mohit Bansal

PDF

1 Repo

TL;DR

This paper introduces adapter-based parameter-efficient transfer learning methods for vision-and-language models, achieving comparable performance to full fine-tuning while using significantly fewer parameters across diverse image-text and video-text tasks.

Contribution

It proposes and evaluates adapter-based techniques with weight sharing for V&L models, reducing parameter count without sacrificing performance.

Findings

01

Adapters with weight sharing match full fine-tuning performance.

02

Training with only 4.18% of parameters for image-text tasks.

03

Training with only 3.39% of parameters for video-text tasks.

Abstract

Recently, fine-tuning language models pre-trained on large text corpora have provided huge improvements on vision-and-language (V&L) tasks as well as on pure language tasks. However, fine-tuning the entire parameter set of pre-trained models becomes impractical since the model size is growing rapidly. Hence, in this paper, we introduce adapter-based parameter-efficient transfer learning techniques to V&L models such as VL-BART and VLT5. We evaluate our methods in a unified multi-task setup on both image-text and video-text benchmarks. For the image-text tasks, we use four diverse V&L datasets: VQAv2, GQA, NLVR2 , and MSCOCO image captioning. For video-text tasks, we use TVQA, How2QA, TVC, and YC2C. With careful training and thorough experiments, we benchmark three popular adapter-based methods (Adapter, Hyperformer, Compacter) against the standard full fine-tuning and the recently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ylsung/vl_adapter
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsVL-T5 · Adapter