Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning
Pei Yang, Ke Zhang, Ji Wang, Xiao Chen, Yuxin Tang, Eric Yang, Lynn Ai, Bill Shi

TL;DR
This paper introduces CRM, a multi-agent framework for reward modeling in reinforcement learning that improves robustness and interpretability by decomposing evaluation into specialized agents and aggregating their signals.
Contribution
The paper proposes CRM, a novel multi-agent collaborative reward model that enhances transparency and stability in RLHF by decomposing preferences and integrating signals from domain-specific evaluators.
Findings
CRM improves robustness of reward signals.
Enhanced interpretability through decomposed evaluation.
Stable policy optimization with multi-perspective rewards.
Abstract
We present CRM (Multi-Agent Collaborative Reward Model), a framework that replaces a single black-box reward model with a coordinated team of specialist evaluators to improve robustness and interpretability in RLHF. Conventional reward models struggle to jointly optimize multiple, sometimes conflicting, preference dimensions (e.g., factuality, helpfulness, safety) and offer limited transparency into why a score is assigned. CRM addresses these issues by decomposing preference evaluation into domain-specific agents that each produce partial signals, alongside global evaluators such as ranker-based and embedding-similarity rewards. A centralized aggregator fuses these signals at each timestep, balancing factors like step-wise correctness, multi-agent agreement, and repetition penalties, yielding a single training reward compatible with standard RL pipelines. The policy is optimized with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Emotion and Mood Recognition
