When Choices Become Priors: Contrastive Decoding for Scientific Figure Multiple-Choice QA
Taeyun Roh, Eun-yeong Jo, Wonjune Jang, Jaewoo Kang

TL;DR
This paper introduces SCICON, a decoding method that improves scientific figure multiple-choice question answering by reducing model bias towards answer priors, enhancing figure-grounded reasoning.
Contribution
SCICON is a training-free contrastive decoding technique that subtracts text-only scores from image-conditioned scores to mitigate prior bias in scientific MCQA.
Findings
SCICON consistently improves accuracy across three benchmarks.
It effectively reduces answer priors influencing model predictions.
The method is simple and does not require additional training.
Abstract
Scientific figure multiple-choice question answering (MCQA) requires models to reason over diverse visual evidence, ranging from charts and multipanel figures to microscopy and biomedical images. However, this setting suffers from a distinctive bias: answer choices themselves can act as priors, steering multimodal models toward scientifically plausible options even when the figure supports a different answer. We investigate this failure mode through a simple question: what if decoding explicitly discounts what the model would prefer from text alone, so as to favor figure-grounded evidence? To this end, we propose SCICON, a training-free decoding method that scores each candidate by subtracting a text-only option score from its image-conditioned counterpart. Unlike prior contrastive decoding approaches that mitigate hallucinations by contrasting original inputs with distorted images or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
