Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning

Pei Yang; Ke Zhang; Ji Wang; Xiao Chen; Yuxin Tang; Eric Yang; Lynn Ai; Bill Shi

arXiv:2511.16202·cs.AI·January 6, 2026

Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning

Pei Yang, Ke Zhang, Ji Wang, Xiao Chen, Yuxin Tang, Eric Yang, Lynn Ai, Bill Shi

PDF

Open Access 1 Video

TL;DR

This paper introduces CRM, a multi-agent framework for reward modeling in reinforcement learning that improves robustness and interpretability by decomposing evaluation into specialized agents and aggregating their signals.

Contribution

The paper proposes CRM, a novel multi-agent collaborative reward model that enhances transparency and stability in RLHF by decomposing preferences and integrating signals from domain-specific evaluators.

Findings

01

CRM improves robustness of reward signals.

02

Enhanced interpretability through decomposed evaluation.

03

Stable policy optimization with multi-perspective rewards.

Abstract

We present CRM (Multi-Agent Collaborative Reward Model), a framework that replaces a single black-box reward model with a coordinated team of specialist evaluators to improve robustness and interpretability in RLHF. Conventional reward models struggle to jointly optimize multiple, sometimes conflicting, preference dimensions (e.g., factuality, helpfulness, safety) and offer limited transparency into why a score is assigned. CRM addresses these issues by decomposing preference evaluation into domain-specific agents that each produce partial signals, alongside global evaluators such as ranker-based and embedding-similarity rewards. A centralized aggregator fuses these signals at each timestep, balancing factors like step-wise correctness, multi-agent agreement, and repetition penalties, yielding a single training reward compatible with standard RL pipelines. The policy is optimized with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Emotion and Mood Recognition