Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions

David Acuna; Ximing Lu; Jaehun Jung; Hyunwoo Kim; Amlan Kar; Sanja Fidler; Yejin Choi

arXiv:2506.08927·cs.CV·June 11, 2025

Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions

David Acuna, Ximing Lu, Jaehun Jung, Hyunwoo Kim, Amlan Kar, Sanja Fidler, Yejin Choi

PDF

Open Access 1 Video

TL;DR

This paper introduces Socratic-MCTS, a search-based method that prompts vision-language models with subquestions to enhance their reasoning capabilities without additional training, leading to improved performance on reasoning benchmarks.

Contribution

It proposes a novel MCTS-inspired algorithm that elicits reasoning in pre-trained models by injecting subquestions, enabling extended reasoning without retraining.

Findings

01

Achieves a 2% overall improvement on MMMU-PRO benchmark.

02

Yields a 9% gain in Liberal Arts category.

03

Demonstrates consistent reasoning improvements across three benchmarks.

Abstract

Recent research in vision-language models (VLMs) has centered around the possibility of equipping them with implicit long-form chain-of-thought reasoning -- akin to the success observed in language models -- via distillation and reinforcement learning. But what about the non-reasoning models already trained and deployed across the internet? Should we simply abandon them, or is there hope for a search mechanism that can elicit hidden knowledge and induce long reasoning traces -- without any additional training or supervision? In this paper, we explore this possibility using a Monte Carlo Tree Search (MCTS)-inspired algorithm, which injects subquestion-subanswer pairs into the model's output stream. We show that framing reasoning as a search process -- where subquestions act as latent decisions within a broader inference trajectory -- helps the model "connect the dots" between fragmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI