Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis
Jiyun Bae, Hyunjong Ok, Sangwoo Mo, Jaeho Lee

TL;DR
This paper investigates how distractors affect the reasoning ability of vision-language models during test time, revealing fundamental differences from textual distractors and proposing mitigation strategies.
Contribution
It introduces Idis, a new visual question-answering dataset with distractors, and analyzes their impact on model reasoning and accuracy, highlighting key differences from textual distractors.
Findings
Visual distractors reduce accuracy without increasing reasoning length.
Tracking attribute counts offers insights into distractor effects.
Proposed prompting strategy mitigates bias-driven predictions.
Abstract
How does irrelevant information (i.e., distractors) affect test-time scaling in vision-language models (VLMs)? Prior studies on language models have reported an inverse scaling effect, where textual distractors lead to longer but less effective reasoning. To investigate whether similar phenomena occur in multimodal settings, we introduce Idis (Images with distractors), a visual question-answering dataset that systematically varies distractors along semantic, numerical, and spatial dimensions. Our analyses reveal that visual distractors differ fundamentally from textual ones: although inverse scaling persists, adding visual distractors reduces accuracy without increasing reasoning length. We further show that tracking attribute counts within reasoning traces provides key insights into how distractors, reasoning length, and accuracy interact. Finally, we demonstrate that these trends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Neurobiology of Language and Bilingualism · Language, Metaphor, and Cognition
