VPGTrans: Transfer Visual Prompt Generator across LLMs

Ao Zhang; Hao Fei; Yuan Yao; Wei Ji; Li Li; Zhiyuan Liu; and Tat-Seng; Chua

arXiv:2305.01278·cs.CV·October 25, 2023·5 cites

VPGTrans: Transfer Visual Prompt Generator across LLMs

Ao Zhang, Hao Fei, Yuan Yao, Wei Ji, Li Li, Zhiyuan Liu, and Tat-Seng, Chua

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces VPGTrans, a transfer learning framework that efficiently transfers visual prompt generators across different large language models, significantly reducing training costs and time.

Contribution

The paper pioneers the study of VPG transferability across LLMs and proposes a two-stage transfer framework that enhances efficiency and reduces resource consumption.

Findings

01

VPGTrans achieves over 10x speed-up in transfer learning.

02

It reduces training data requirements by approximately 10.7%.

03

The method effectively adapts VPGs across different LLM sizes and types.

Abstract

While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm. However, further tuning the VPG part of the MLLM still suffers from indispensable computational costs, i.e., requiring thousands of GPU hours and millions of training data. One alternative solution is to transfer an existing VPG from any existing MLLMs for the target MLLM. In this work, we for the first time investigate the VPG transferability across LLMs, and explore a solution to reduce the cost of VPG transfer. We first study the VPG transfer across different LLM sizes (e.g., small-to-large), and across different LLM types, through which we diagnose the key factors to maximize the transfer efficiency. Based on our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vpgtrans/vpgtrans
pytorchOfficial

Videos

VPGTrans: Transfer Visual Prompt Generator across LLMs· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings