Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Junxin Wang; Dai Guan; Weijie Qiu; Zhihang Li; Yongbo Gai; Zhengyi Yang; Mengyu Zhou; Erchao Zhao; Xiaoxi Jiang; Guanjun Jiang

arXiv:2603.16253·cs.CV·May 12, 2026

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Junxin Wang, Dai Guan, Weijie Qiu, Zhihang Li, Yongbo Gai, Zhengyi Yang, Mengyu Zhou, Erchao Zhao, Xiaoxi Jiang, Guanjun Jiang

PDF

1 Repo

TL;DR

This paper proposes EVPV, a verification method that improves vision-language reward models by explicitly checking visual premises, leading to more reliable reasoning and better reranking accuracy.

Contribution

It introduces EVPV, a lightweight interface that decouples perception from reasoning in vision-language models, enhancing verification and performance without additional tool calls.

Findings

01

EVPV improves step-level verification accuracy.

02

It boosts Best-of-N reranking performance across benchmarks.

03

Performance degrades monotonically with constraint corruption.

Abstract

Vision-language process reward models (VL-PRMs) are increasingly used to score intermediate reasoning steps and rerank candidates under test-time scaling. However, they often function as black-box judges: a low step score may reflect a genuine reasoning mistake or simply the verifier's misperception of the image. This entanglement between perception and reasoning leads to systematic false positives (rewarding hallucinated visual premises) and false negatives (penalizing correct grounded statements), undermining both reranking and error localization. We introduce Explicit Visual Premise Verification (EVPV), a lightweight verification interface that conditions step scoring on the reliability of the visual premises a step depends on. The policy is prompted to produce a step-wise visual checklist that makes required visual facts explicit, while a constraint extractor independently derives…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Qwen-Applications/EVPV-PRM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.