UniAdapter: Unified Parameter-Efficient Transfer Learning for   Cross-modal Modeling

Haoyu Lu; Yuqi Huo; Guoxing Yang; Zhiwu Lu; Wei Zhan; Masayoshi; Tomizuka; Mingyu Ding

arXiv:2302.06605·cs.CV·May 23, 2023·20 cites

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Haoyu Lu, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Wei Zhan, Masayoshi, Tomizuka, Mingyu Ding

PDF

Open Access 2 Repos

TL;DR

UniAdapter introduces a parameter-efficient method for cross-modal transfer learning that unifies unimodal and multimodal adapters, achieving superior performance with minimal additional parameters across various vision-language tasks.

Contribution

The paper presents UniAdapter, a unified adapter framework that reduces parameters via weight sharing, enabling effective cross-modal transfer learning without full fine-tuning.

Findings

01

Outperforms state-of-the-art on multiple benchmarks.

02

Requires only 1-2% of the pre-trained model's parameters.

03

Achieves superior results on MSRVTT retrieval task.

Abstract

Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. Specifically, adapters are distributed to different modalities and their interactions, with the total number of tunable parameters reduced by partial weight sharing. The unified and knowledge-sharing design enables powerful cross-modal representations that can benefit various downstream tasks, requiring only 1.0%-2.0% tunable parameters of the pre-trained model. Extensive experiments on 6 cross-modal downstream benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques