Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao; Xiaohan Ding; Juexiao Feng; Yuhong Yang; Hui Chen and; Guiguang Ding

arXiv:2407.10704·cs.CV·July 23, 2024

Quantized Prompt for Efficient Generalization of Vision-Language Models

Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen and, Guiguang Ding

PDF

Open Access 1 Repo

TL;DR

This paper introduces a quantization-based regularization method for vision-language models like CLIP, which enhances generalization and reduces resource costs during adaptation, especially on resource-limited devices.

Contribution

It proposes a novel quantization technique as an efficient regularizer for vision-language models, improving generalization while minimizing additional computational and storage costs.

Findings

01

Outperforms existing methods on 11 datasets

02

Reduces storage and inference costs significantly

03

Enhances accuracy when integrated with approaches like MaPLe

Abstract

In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to overly focus on the current data and lose more crucial domain-general knowledge. Existing works use classic regularization techniques to solve the problems. As solutions become increasingly complex, the ever-growing storage and inference costs are also a significant problem that urgently needs to be addressed. While in this paper, we start from an observation that proper random noise can suppress overfitting and catastrophic forgetting. Then we regard quantization error as a kind of noise, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

beyondhtx/qprompt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Constraint Satisfaction and Optimization

MethodsContrastive Language-Image Pre-training · Focus