StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models

Xiangxiang Zhang; Jingxuan Wei; Donghong Zhong; Qi Chen; Caijun Jia; Cheng Tan; Jinming Gu; Xiaobo Qin; Zhiping Liu; Liang Hu; Tong Sun; Yuchen Wu; Zewei Sun; Chenwei Lou; Hua Zheng; Tianyang Zhan; Changbao Wang; Shuangzhi Wu; Zefa Lin; Chang Guo; Sihang Yuan; Riwei Chen; Shixiong Zhao; Yingping Zhang; Gaowei Wu; Bihui Yu; Jiahui Wu; Zhehui Zhao; Qianqian Liu; Ruofeng Tang; Xingyue Huang; Bing Zhao; Mengyang Zhang; Youqiang Zhou

arXiv:2508.05383·cs.AI·August 8, 2025

StructVRM: Aligning Multimodal Reasoning with Structured and Verifiable Reward Models

Xiangxiang Zhang, Jingxuan Wei, Donghong Zhong, Qi Chen, Caijun Jia, Cheng Tan, Jinming Gu, Xiaobo Qin, Zhiping Liu, Liang Hu, Tong Sun, Yuchen Wu, Zewei Sun, Chenwei Lou, Hua Zheng, Tianyang Zhan, Changbao Wang, Shuangzhi Wu, Zefa Lin, Chang Guo, Sihang Yuan, Riwei Chen

PDF

TL;DR

StructVRM introduces a structured, verifiable reward mechanism for multimodal reasoning models, enabling fine-grained feedback and partial credit, leading to state-of-the-art results on complex benchmarks.

Contribution

We propose StructVRM, a novel approach that aligns multimodal reasoning with structured, verifiable rewards, improving model guidance for complex, multi-part questions.

Findings

01

Achieved state-of-the-art performance on 6 out of 12 benchmarks.

02

Demonstrated effectiveness on high-difficulty STEM-Bench.

03

Validated the approach's superiority over traditional reward methods.

Abstract

Existing Vision-Language Models often struggle with complex, multi-question reasoning tasks where partial correctness is crucial for effective learning. Traditional reward mechanisms, which provide a single binary score for an entire response, are too coarse to guide models through intricate problems with multiple sub-parts. To address this, we introduce StructVRM, a method that aligns multimodal reasoning with Structured and Verifiable Reward Models. At its core is a model-based verifier trained to provide fine-grained, sub-question-level feedback, assessing semantic and mathematical equivalence rather than relying on rigid string matching. This allows for nuanced, partial credit scoring in previously intractable problem formats. Extensive experiments demonstrate the effectiveness of StructVRM. Our trained model, Seed-StructVRM, achieves state-of-the-art performance on six out of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.