Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models
Qiyuan Zhang, Yufei Wang, Tianhe Wu, Can Xu, Qingfeng Sun, Kai Zheng, Xue Liu, Chen Ma

TL;DR
This paper introduces Mix-GRM, a framework that combines structured reasoning mechanisms to improve generative reward models, achieving state-of-the-art results across multiple benchmarks by aligning reasoning style with task type.
Contribution
The paper proposes Mix-GRM, a novel modular framework that synthesizes Breadth-CoT and Depth-CoT reasoning, optimizing them via supervised fine-tuning and reinforcement learning, leading to significant performance gains.
Findings
Mix-GRM surpasses existing models by 8.2% on average across five benchmarks.
B-CoT is better suited for subjective preference tasks.
D-CoT performs best on objective correctness tasks.
Abstract
Recent advancements in Generative Reward Models (GRMs) have demonstrated that scaling the length of Chain-of-Thought (CoT) reasoning considerably enhances the reliability of evaluation. However, current works predominantly rely on unstructured length scaling, ignoring the divergent efficacy of different reasoning mechanisms: Breadth-CoT (B-CoT, i.e., multi-dimensional principle coverage) and Depth-CoT (D-CoT, i.e., substantive judgment soundness). To address this, we introduce Mix-GRM, a framework that reconfigures raw rationales into structured B-CoT and D-CoT through a modular synthesis pipeline, subsequently employing Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR) to internalize and optimize these mechanisms. Comprehensive experiments demonstrate that Mix-GRM establishes a new state-of-the-art across five benchmarks, surpassing leading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
