When Negation Is a Geometry Problem in Vision-Language Models

Fawaz Sammani; Tzoulio Chamiti; Paul Gavrikov; Nikos Deligiannis

arXiv:2603.20554·cs.CV·April 6, 2026

When Negation Is a Geometry Problem in Vision-Language Models

Fawaz Sammani, Tzoulio Chamiti, Paul Gavrikov, Nikos Deligiannis

PDF

1 Repo 1 Datasets

TL;DR

This paper investigates negation understanding in vision-language models, proposing a new evaluation framework, identifying a negation-related direction in CLIP embeddings, and demonstrating test-time intervention to improve negation awareness without fine-tuning.

Contribution

It introduces a multimodal LLM-based evaluation for negation, finds a negation-related direction in CLIP embeddings, and shows test-time manipulation can enhance negation understanding.

Findings

01

A negation-related direction exists in CLIP embedding space.

02

Test-time intervention can steer CLIP toward negation-aware behavior.

03

The proposed evaluation framework provides a more reliable measure of negation understanding.

Abstract

Joint Vision-Language Embedding models such as CLIP typically fail at understanding negation in text queries, for example, failing to distinguish "no" in the query: "a plain blue shirt with no logos". Prior work has largely addressed this limitation through data-centric approaches, fine-tuning CLIP on large-scale synthetic negation datasets. However, these efforts are commonly evaluated using retrieval-based metrics that cannot reliably reflect whether negation is actually understood. In this paper, we identify two key limitations of such evaluation metrics and investigate an alternative evaluation framework based on Multimodal LLMs-as-a-judge, which typically excel at understanding simple yes/no questions about image content, providing a fair evaluation of negation understanding in CLIP models. We then ask whether there already exists a direction in the CLIP embedding space associated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fawazsammani/negation-steering
github

Datasets

mrTzou/N-COCO
dataset· 53 dl
53 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.