TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM
Peng Kuang, Xiangxiang Wang, Wentao Liu, Jian Dong, Kaidi Xu

TL;DR
TIM-PRM introduces an active, tool-augmented verification framework for multimodal reasoning in large language models, effectively reducing hallucinations and logical errors by explicit strategy planning and external evidence querying.
Contribution
The paper presents TIM-PRM, a novel agentic framework that transforms verification into an active investigation using external tools, improving multimodal reasoning accuracy and interpretability.
Findings
TIM-PRM outperforms existing multimodal PRMs on VisualProcessBench.
TIM-PRM surpasses larger models like Qwen2.5-72B and InternVL-78B in verification tasks.
The framework provides interpretable insights into the verification process.
Abstract
Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that standard outcome-based supervision fails to mitigate. While Process Reward Models (PRMs) promise step-by-step verification, current approaches typically operate as scalar scorers or generative critics that suffer from sycophancy, blindly validating the flawed hypotheses rather than grounding them in visual reality. To bridge this gap, we introduce TIM-PRM (Tool-Integrated Multimodal PRM), a novel agentic framework that transforms verification from a passive classification task into an active, tool-augmented investigation. TIM-PRM is trained to explicitly plan verification strategies and utilizes a mechanism of Independent Question Asking to query evidence via external tools, effectively decoupling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
