Vision-Language Models Do Not Understand Negation

Kumail Alhamoud; Shaden Alshammari; Yonglong Tian; Guohao Li; Philip Torr; Yoon Kim; Marzyeh Ghassemi

arXiv:2501.09425·cs.CV·May 14, 2025·2 cites

Vision-Language Models Do Not Understand Negation

Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip Torr, Yoon Kim, Marzyeh Ghassemi

PDF

Open Access

TL;DR

This paper evaluates the ability of current vision-language models to understand negation, introduces NegBench benchmark, and demonstrates that fine-tuning on synthetic negation data improves their performance significantly.

Contribution

The study introduces NegBench, a comprehensive benchmark for negation understanding, and shows that fine-tuning models on synthetic negation data enhances their negation comprehension.

Findings

01

Modern VLMs perform at chance level on negation tasks.

02

Fine-tuning on synthetic negation datasets improves recall by 10%.

03

Accuracy on negated multiple-choice questions increases by 28%.

Abstract

Many practical vision-language applications require models that understand negation, e.g., when using natural language to retrieve images which contain certain objects but not others. Despite advancements in vision-language models (VLMs) through large-scale training, their ability to comprehend negation remains underexplored. This study addresses the question: how well do current VLMs understand negation? We introduce NegBench, a new benchmark designed to evaluate negation understanding across 18 task variations and $79$ k examples spanning image, video, and medical datasets. The benchmark consists of two core tasks designed to evaluate negation understanding in diverse multimodal settings: Retrieval with Negation and Multiple Choice Questions with Negated Captions. Our evaluation reveals that modern VLMs struggle significantly with negation, often performing at chance level. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Language, Metaphor, and Cognition · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training