When Negation Is a Geometry Problem in Vision-Language Models
Fawaz Sammani, Tzoulio Chamiti, Paul Gavrikov, Nikos Deligiannis

TL;DR
This paper investigates negation understanding in vision-language models, proposing a new evaluation framework, identifying a negation-related direction in CLIP embeddings, and demonstrating test-time intervention to improve negation awareness without fine-tuning.
Contribution
It introduces a multimodal LLM-based evaluation for negation, finds a negation-related direction in CLIP embeddings, and shows test-time manipulation can enhance negation understanding.
Findings
A negation-related direction exists in CLIP embedding space.
Test-time intervention can steer CLIP toward negation-aware behavior.
The proposed evaluation framework provides a more reliable measure of negation understanding.
Abstract
Joint Vision-Language Embedding models such as CLIP typically fail at understanding negation in text queries, for example, failing to distinguish "no" in the query: "a plain blue shirt with no logos". Prior work has largely addressed this limitation through data-centric approaches, fine-tuning CLIP on large-scale synthetic negation datasets. However, these efforts are commonly evaluated using retrieval-based metrics that cannot reliably reflect whether negation is actually understood. In this paper, we identify two key limitations of such evaluation metrics and investigate an alternative evaluation framework based on Multimodal LLMs-as-a-judge, which typically excel at understanding simple yes/no questions about image content, providing a fair evaluation of negation understanding in CLIP models. We then ask whether there already exists a direction in the CLIP embedding space associated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
