Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems

Lang Feng; Longtao Zheng; Shuo He; Fuxiang Zhang; Bo An

arXiv:2602.08847·cs.LG·February 10, 2026

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems

Lang Feng, Longtao Zheng, Shuo He, Fuxiang Zhang, Bo An

PDF

Open Access 1 Models

TL;DR

This paper introduces Dr. MAS, a stable reinforcement learning method for multi-agent LLM systems that normalizes advantages per agent, leading to more reliable training and improved performance on reasoning and search benchmarks.

Contribution

The paper identifies a key instability in multi-agent RL training and proposes Dr. MAS, a normalization-based approach that stabilizes training and enhances multi-agent LLM system performance.

Findings

01

Dr. MAS significantly improves performance on math reasoning and search benchmarks.

02

It reduces gradient spikes and stabilizes training in multi-agent LLM systems.

03

Effective under heterogeneous agent-model configurations.

Abstract

Multi-agent LLM systems enable advanced reasoning and tool use via role specialization, yet reliable reinforcement learning (RL) post-training for such systems remains difficult. In this work, we theoretically pinpoint a key reason for training instability when extending group-based RL to multi-agent LLM systems. We show that under GRPO-style optimization, a global normalization baseline may deviate from diverse agents' reward distributions, which ultimately leads to gradient-norm instability. Based on this finding, we propose Dr. MAS, a simple and stable RL training recipe for multi-agent LLM systems. Dr. MAS uses an agent-wise remedy: normalizing advantages per agent using each agent's own reward statistics, which calibrates gradient scales and dramatically stabilizes training, both theoretically and empirically. Beyond the algorithm, Dr. MAS provides an end-to-end RL training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zhangzhifang/verl-agent
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Robot Manipulation and Learning