What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning

Yujin Zhou; Pengcheng Wen; Jiale Chen; Boqin Yin; Han Zhu; Jiaming Ji; Juntao Dai; Chi-Min Chan; Sirui Han

arXiv:2602.08346·cs.CV·February 10, 2026

What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning

Yujin Zhou, Pengcheng Wen, Jiale Chen, Boqin Yin, Han Zhu, Jiaming Ji, Juntao Dai, Chi-Min Chan, Sirui Han

PDF

Open Access 1 Video

TL;DR

This paper introduces the first comprehensive benchmark for Process Reward Models in the thinking with images paradigm, analyzing their ability to evaluate visual reasoning steps and identifying current limitations of LVLMs.

Contribution

It provides a detailed analysis of reasoning errors, constructs a large annotated benchmark, and evaluates current LVLMs, highlighting the need for specialized PRMs in visual reasoning.

Findings

01

Current LVLMs perform poorly as PRMs in visual reasoning.

02

Significant disparities exist across different error types.

03

Current models exhibit positive bias and are sensitive to reasoning step positions.

Abstract

The rapid advancement of Large Vision Language Models (LVLMs) has demonstrated excellent abilities in various visual tasks. Building upon these developments, the thinking with images paradigm has emerged, enabling models to dynamically edit and re-encode visual information at each reasoning step, mirroring human visual processing. However, this paradigm introduces significant challenges as diverse errors may occur during reasoning processes. This necessitates Process Reward Models (PRMs) for distinguishing positive and negative reasoning steps, yet existing benchmarks for PRMs are predominantly text-centric and lack comprehensive assessment under this paradigm. To address these gaps, this work introduces the first comprehensive benchmark specifically designed for evaluating PRMs under the thinking with images paradigm. Our main contributions are: (1) Through extensive analysis of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Language, Metaphor, and Cognition · Ethics and Social Impacts of AI