Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen and, Guiguang Ding

TL;DR
This paper introduces a quantization-based regularization method for vision-language models like CLIP, which enhances generalization and reduces resource costs during adaptation, especially on resource-limited devices.
Contribution
It proposes a novel quantization technique as an efficient regularizer for vision-language models, improving generalization while minimizing additional computational and storage costs.
Findings
Outperforms existing methods on 11 datasets
Reduces storage and inference costs significantly
Enhances accuracy when integrated with approaches like MaPLe
Abstract
In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to overly focus on the current data and lose more crucial domain-general knowledge. Existing works use classic regularization techniques to solve the problems. As solutions become increasingly complex, the ever-growing storage and inference costs are also a significant problem that urgently needs to be addressed. While in this paper, we start from an observation that proper random noise can suppress overfitting and catastrophic forgetting. Then we regard quantization error as a kind of noise, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Constraint Satisfaction and Optimization
MethodsContrastive Language-Image Pre-training · Focus
