NegVQA: Can Vision Language Models Understand Negation?

Yuhui Zhang; Yuchang Su; Yiming Liu; Serena Yeung-Levy

arXiv:2505.22946·cs.CL·May 30, 2025

NegVQA: Can Vision Language Models Understand Negation?

Yuhui Zhang, Yuchang Su, Yiming Liu, Serena Yeung-Levy

PDF

Open Access

TL;DR

NegVQA is a new benchmark designed to evaluate vision language models' understanding of negation, revealing significant performance gaps and a U-shaped scaling trend as models grow larger.

Contribution

We introduce NegVQA, a comprehensive negation-focused VQA benchmark, and evaluate leading models, uncovering their struggles and the non-linear effects of model size on negation comprehension.

Findings

01

Models perform poorly on negation questions compared to original ones.

02

Performance drops initially with increasing model size, then improves at larger scales.

03

NegVQA exposes critical gaps in current VLMs' negation understanding.

Abstract

Negation is a fundamental linguistic phenomenon that can entirely reverse the meaning of a sentence. As vision language models (VLMs) continue to advance and are deployed in high-stakes applications, assessing their ability to comprehend negation becomes essential. To address this, we introduce NegVQA, a visual question answering (VQA) benchmark consisting of 7,379 two-choice questions covering diverse negation scenarios and image-question distributions. We construct NegVQA by leveraging large language models to generate negated versions of questions from existing VQA datasets. Evaluating 20 state-of-the-art VLMs across seven model families, we find that these models struggle significantly with negation, exhibiting a substantial performance drop compared to their responses to the original questions. Furthermore, we uncover a U-shaped scaling trend, where increasing model size initially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning