Limits and Gains of Test-Time Scaling in Vision-Language Reasoning
Mohammadjavad Ahmadpour, Amirmahdi Meighani, Payam Taebi, Omid Ghahroodi, Amirmohammad Izadi, Mahdieh Soleymani Baghshah

TL;DR
This paper systematically evaluates test-time scaling in vision-language models, revealing its variable effectiveness depending on model type, task, and dataset, and highlighting the need for adaptive strategies.
Contribution
It provides the first comprehensive empirical analysis of test-time scaling in vision-language models across diverse benchmarks and model types.
Findings
Closed-source models benefit from structured reasoning and self-refinement.
Open-source models show inconsistent gains, with external verification being most reliable.
TTS improvements are dataset-dependent, aiding multi-step reasoning but limited in perception tasks.
Abstract
Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning ability of Large Language Models (LLMs) by allocating additional computation at inference, yet its application to multimodal systems such as Vision-Language Models (VLMs) remains underexplored. In this work, we present a systematic empirical study of inference time reasoning methods applied across both open-source and closed-source VLMs on different benchmarks. Our results reveal that while closed-source models consistently benefit from structured reasoning and iterative Self-Refinement, open-source VLMs show inconsistent behavior: external verification provides the most reliable gains, whereas iterative refinement often degrades performance. We further find that the effectiveness of TTS is dataset-dependent, yielding clear improvements on multi-step reasoning tasks but offering only limited gains on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
