Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Shuai Fu; Xiequn Wang; Qiushi Huang; Yu Zhang

arXiv:2408.13979·cs.CV·August 28, 2024

Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models

Shuai Fu, Xiequn Wang, Qiushi Huang, Yu Zhang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

Nemesis introduces a normalization technique for soft prompts in vision-language models, revealing that adjusting prompt norms can improve model performance and offering new insights into prompt tuning strategies.

Contribution

This work is the first to systematically analyze the impact of soft prompt norms in VLMs and proposes a normalization method called Nemesis to enhance their performance.

Findings

01

Reducing prompt norms can improve VLM performance.

02

Increasing prompt norms often degrades model accuracy.

03

Normalization of soft prompts leads to better downstream task results.

Abstract

With the prevalence of large-scale pretrained vision-language models (VLMs), such as CLIP, soft-prompt tuning has become a popular method for adapting these models to various downstream tasks. However, few works delve into the inherent properties of learnable soft-prompt vectors, specifically the impact of their norms to the performance of VLMs. This motivates us to pose an unexplored research question: ``Do we need to normalize the soft prompts in VLMs?'' To fill this research gap, we first uncover a phenomenon, called the \textbf{Low-Norm Effect} by performing extensive corruption experiments, suggesting that reducing the norms of certain learned prompts occasionally enhances the performance of VLMs, while increasing them often degrades it. To harness this effect, we propose a novel method named \textbf{N}ormalizing th\textbf{e} soft-pro\textbf{m}pt v\textbf{e}ctors of…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

1、The paper is the first study to discuss the influence of soft-prompt toward VLM. 2、The paper conducted REPLACE and RESCALE to discuss the normalization of soft-prompt, and proposed Nemesis including two normalization losses to improve the effectiveness of soft-prompt. 3、The paper has conducted a lot of experiments to prove the effectiveness of the method.

Weaknesses

1、The writing of some parts of the paper are not clear enough. It is recommended that the authors check. For example, there is a discrepancy between formula 4 and the symbol definition in the previous paragraph. 2、The two types of losses proposed in the paper lack a correlation with practical significance, suggesting authors discuss why the two forms of normalization affect soft prompt. 3、The paper lacks discussion on the applicable scenarios of two normalization losses.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The paper pioneers a systematic investigation into the role of soft-prompt vector norms in VLMs, addressing a previously unexplored research question. 2. The proposed Nemesis method, with its innovative PEN and PAN losses, offers a potential solution to the Low-Norm Effect, showing promise for improving VLM performance. 3. Extensive corruption experiments shed light on the Low-Norm Effect's impact, providing valuable insights for future soft-prompt tuning endeavors.

Weaknesses

1. $\beta$ can be either 0 or 1, corresponding to two variants of the proposed Nemesis method. However, there is no ablation study on the selection of $\beta$, nor is there an exploration of the potential impact of setting $\beta$ with decimal values to assign weights to the two methods. 2. The paper introduces a pre-inference step before each training batch to identify positions inducing the Low-Norm Effect. Such a step could introduce computational overhead, especially with larger datasets or

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

(1) new soft-prompt vector normalization method for VLMs, which can be incorporated into any soft-prompt based methods; (2) better results when evaluated by domain generalization settings for VLMs.

Weaknesses

1. prefer to learn more details of how you decide the length of soft prompt vectors, e.g., why 4 and 16, will there be more ranges to be investigated basing on the specificl tasks for VLMs? 2. prefer to learn more investigations of combining Nemesis with existing PEFT algorithms to see if the results can be further improved or not so that other researchers can better leverage your method to their existing frameworks.

Code & Models

Repositories

shyfoo/nemesis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsContrastive Language-Image Pre-training