A Nash Equilibrium Framework For Training-Free Multimodal Step Verification
Rohit Sinha, Kunal Tilaganji, Tanuja Ganu, Nagarajan Natarajan, Amit Sharma, Vineeth N. Balasubramanian

TL;DR
This paper introduces a training-free, Nash equilibrium-based verification method for multimodal reasoning that leverages disagreement among specialized judges to improve step validation without task-specific training.
Contribution
It formalizes step verification as a Nash equilibrium game, enabling disagreement-aware filtering and ranking, and demonstrates its effectiveness across multiple benchmarks.
Findings
Achieves 2.4% to 5.2% improvements over baselines.
Outperforms some learned critics without training.
Provides a robust, task-agnostic verification method.
Abstract
Multimodal large language models often generate reasoning chains containing subtle errors that lead to incorrect answers. Current verification approaches have notable limitations. Learned critics need extensive labeled data and show inconsistent performance across different tasks. Meanwhile, existing training-free methods simply average scores from different sources, missing a key insight: when these scores disagree, that disagreement itself carries important information about whether a reasoning step is truly valid or not. We propose a training-free verification approach that treats step-wise verification as a coordination problem among specialized judges. We formalize these judges' interaction as a Nash equilibrium game where agreement signals valid steps while disagreement reveals instability. Our method computes equilibrium scores through a closed-form solution, enabling both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
