Sensitivity of Generative VLMs to Semantically and Lexically Altered   Prompts

Sri Harsha Dumpala; Aman Jaiswal; Chandramouli Sastry; Evangelos; Milios; Sageev Oore; Hassan Sajjad

arXiv:2410.13030·cs.CV·October 18, 2024

Sensitivity of Generative VLMs to Semantically and Lexically Altered Prompts

Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos, Milios, Sageev Oore, Hassan Sajjad

PDF

Open Access

TL;DR

This paper investigates how generative vision-language models respond to lexical and semantic changes in prompts, revealing high sensitivity to lexical alterations and implications for prompt-based consistency techniques.

Contribution

The study introduces the SugarCrepe++ dataset and provides a comprehensive analysis of VLMs' sensitivity to prompt alterations, highlighting a significant vulnerability.

Findings

01

VLMs are highly sensitive to lexical changes in prompts.

02

Lexical alterations impact the consistency of model outputs.

03

Sensitivity affects the robustness of prompt-tuning methods.

Abstract

Despite the significant influx of prompt-tuning techniques for generative vision-language models (VLMs), it remains unclear how sensitive these models are to lexical and semantic alterations in prompts. In this paper, we evaluate the ability of generative VLMs to understand lexical and semantic changes in text using the SugarCrepe++ dataset. We analyze the sensitivity of VLMs to lexical alterations in prompts without corresponding semantic changes. Our findings demonstrate that generative VLMs are highly sensitive to such alterations. Additionally, we show that this vulnerability affects the performance of techniques aimed at achieving consistency in their outputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems