Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Qiyuan Zhang; Yufei Wang; Tianhe Wu; Can Xu; Qingfeng Sun; Kai Zheng; Xue Liu; Chen Ma

arXiv:2603.01571·cs.AI·March 3, 2026

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Qiyuan Zhang, Yufei Wang, Tianhe Wu, Can Xu, Qingfeng Sun, Kai Zheng, Xue Liu, Chen Ma

PDF

Open Access

TL;DR

This paper introduces Mix-GRM, a framework that combines structured reasoning mechanisms to improve generative reward models, achieving state-of-the-art results across multiple benchmarks by aligning reasoning style with task type.

Contribution

The paper proposes Mix-GRM, a novel modular framework that synthesizes Breadth-CoT and Depth-CoT reasoning, optimizing them via supervised fine-tuning and reinforcement learning, leading to significant performance gains.

Findings

01

Mix-GRM surpasses existing models by 8.2% on average across five benchmarks.

02

B-CoT is better suited for subjective preference tasks.

03

D-CoT performs best on objective correctness tasks.

Abstract

Recent advancements in Generative Reward Models (GRMs) have demonstrated that scaling the length of Chain-of-Thought (CoT) reasoning considerably enhances the reliability of evaluation. However, current works predominantly rely on unstructured length scaling, ignoring the divergent efficacy of different reasoning mechanisms: Breadth-CoT (B-CoT, i.e., multi-dimensional principle coverage) and Depth-CoT (D-CoT, i.e., substantive judgment soundness). To address this, we introduce Mix-GRM, a framework that reconfigures raw rationales into structured B-CoT and D-CoT through a modular synthesis pipeline, subsequently employing Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR) to internalize and optimize these mechanisms. Comprehensive experiments demonstrate that Mix-GRM establishes a new state-of-the-art across five benchmarks, surpassing leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics