Mitigating Easy Option Bias in Multiple-Choice Question Answering
Hao Zhang, Chen Li, Basura Fernando

TL;DR
This paper identifies an Easy-Options Bias in multiple-choice VQA benchmarks, which allows models to guess answers using only visual and option inputs, and proposes GroundAttack to generate challenging options to mitigate this bias.
Contribution
The paper introduces GroundAttack, a toolkit for creating hard negative options, and provides new EOB-free annotations for better evaluation of vision-language models.
Findings
Current VLMs perform near random on EOB-free datasets
GroundAttack effectively generates visually plausible hard negatives
New annotations reduce bias, leading to more accurate assessments
Abstract
In this early study, we observe an Easy-Options Bias (EOB) issue in some multiple-choice Visual Question Answering (VQA) benchmarks such as MMStar, RealWorldQA, SEED-Bench, Next-QA, STAR benchmark and Video-MME. This bias allows vision-language models (VLMs) to select the correct answer using only the vision (V) and options (O) as inputs, without the need for the question (Q). Through grounding experiments, we attribute the bias to an imbalance in visual relevance: the correct answer typically aligns more closely with the visual contents than the negative options in feature space, creating a shortcut for VLMs to infer the answer via simply vision-option similarity matching. To fix this, we introduce GroundAttack, a toolkit that automatically generates hard negative options as visually plausible as the correct answer. We apply it to the NExT-QA and MMStar datasets, creating new EOB-free…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
