Towards Compatible Fine-tuning for Vision-Language Model Updates

Zhengbo Wang; Jian Liang; Lijun Sheng; Ran He; Zilei Wang; Tieniu Tan

arXiv:2412.20895·cs.CV·December 31, 2024

Towards Compatible Fine-tuning for Vision-Language Model Updates

Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

PDF

Open Access

TL;DR

This paper introduces ContCoOp, a novel fine-tuning method for vision-language models that maintains effectiveness across model updates by dynamically adapting prompts, demonstrated through extensive experiments on multiple datasets.

Contribution

We propose ContCoOp, a new prompt optimization approach that enhances compatibility with model updates, addressing a key limitation of existing fine-tuning methods.

Findings

01

ContCoOp outperforms baseline methods in compatibility with model updates.

02

It achieves robust out-of-distribution generalization.

03

Demonstrated effectiveness across 15 datasets.

Abstract

So far, efficient fine-tuning has become a popular strategy for enhancing the capabilities of foundation models on downstream tasks by learning plug-and-play modules. However, existing methods overlook a crucial issue: if the underlying foundation model is updated, are these plug-and-play modules still effective? In this paper, we first conduct a detailed analysis of various fine-tuning methods on the CLIP in terms of their compatibility with model updates. The study reveals that many high-performing fine-tuning methods fail to be compatible with the upgraded models. To address this, we propose a novel approach, Class-conditioned Context Optimization (ContCoOp), which integrates learnable prompts with class embeddings using an attention layer before inputting them into the text encoder. Consequently, the prompts can dynamically adapt to the changes in embedding space (due to model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Robotics and Automated Systems

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training