PRSM: A Measure to Evaluate CLIP's Robustness Against Paraphrases
Udo Schlegel, Franziska Weeber, Jian Lan, Thomas Seidl

TL;DR
This paper introduces PRSM, a new metric to evaluate CLIP's robustness to paraphrasing, revealing variability in stability across paraphrasing strategies and demographic biases.
Contribution
The paper proposes PRSM, a novel measure for assessing CLIP's sensitivity to paraphrased inputs, and empirically analyzes its robustness and bias using the Social Counterfactuals dataset.
Findings
Robustness varies across different paraphrasing strategies.
Subtle differences in robustness are observed between gender-associated queries.
CLIP's stability is influenced by paraphrasing and demographic factors.
Abstract
Contrastive Language-Image Pre-training (CLIP) is a widely used multimodal model that aligns text and image representations through large-scale training. While it performs strongly on zero-shot and few-shot tasks, its robustness to linguistic variation, particularly paraphrasing, remains underexplored. Paraphrase robustness is essential for reliable deployment, especially in socially sensitive contexts where inconsistent representations can amplify demographic biases. In this paper, we introduce the Paraphrase Ranking Stability Metric (PRSM), a novel measure for quantifying CLIP's sensitivity to paraphrased queries. Using the Social Counterfactuals dataset, a benchmark designed to reveal social and demographic biases, we empirically assess CLIP's stability under paraphrastic variation, examine the interaction between paraphrase robustness and gender, and discuss implications for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
