TL;DR
This paper introduces CXR-ContraBench, a benchmark for evaluating and repairing negated-option attraction failures in medical vision-language models, highlighting significant clinical risks and proposing a deterministic fix.
Contribution
The paper presents a new diagnostic benchmark and a repair method for negation errors in medical VLMs, improving clinical reliability without retraining.
Findings
Models show substantial failure rates on negation detection in medical VLMs.
Chain-of-thought prompting reduces but does not eliminate negation errors.
QCCV-Neg repair significantly improves model accuracy on polarity-confused cases.
Abstract
When a chest X-ray shows consolidation but the question asks which finding is present, a medical vision-language model may answer "No consolidation." This is more than an incorrect choice: it is a polarity reversal that emits a clinical statement contradicting the image. We study this failure as negated-option attraction, where a model is drawn to a negated answer option even when it conflicts with both the visual evidence and the question. We introduce CXR-ContraBench (Chest X-Ray Contradiction Benchmark), a diagnostic benchmark spanning internal ReXVQA slices and external OpenI and CheXpert protocols. The benchmark centers on present-finding questions, where selecting "No X" despite visible X creates the main clinical risk, and uses absent-finding questions as secondary tests of whether models copy negated wording. Across CheXpert protocols, the failure is substantial and persistent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
