Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

Zhongyi Li; Wan Tian; Jingyu Chen; Kangyao Huang; Huiming Zhang; Hui Yang; Tao Ren; Jinyang Jiang; Yijie Peng; Yikun Ban; Fuzhen Zhuang

arXiv:2603.21574·cs.AI·March 24, 2026

Adaptive Robust Estimator for Multi-Agent Reinforcement Learning

Zhongyi Li, Wan Tian, Jingyu Chen, Kangyao Huang, Huiming Zhang, Hui Yang, Tao Ren, Jinyang Jiang, Yijie Peng, Yikun Ban, Fuzhen Zhuang

PDF

Open Access

TL;DR

This paper introduces a robust multi-agent reinforcement learning framework with a structured reasoning pipeline and an adaptive estimator, improving stability and performance in noisy reward environments.

Contribution

It proposes the DACR framework and ARE estimator, enhancing credit attribution and robustness in multi-agent collaborative reasoning tasks.

Findings

01

Outperforms baselines on mathematical reasoning benchmarks

02

Demonstrates robustness to reward noise in experiments

03

Achieves more stable training dynamics

Abstract

Multi-agent collaboration has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models, yet it suffers from interaction-level ambiguity that blurs generation, critique, and revision, making credit assignment across agents difficult. Moreover, policy optimization in this setting is vulnerable to heavy-tailed and noisy rewards, which can bias advantage estimation and trigger unstable or even divergent training. To address both issues, we propose a robust multi-agent reinforcement learning framework for collaborative reasoning, consisting of two components: Dual-Agent Answer-Critique-Rewrite (DACR) and an Adaptive Robust Estimator (ARE). DACR decomposes reasoning into a structured three-stage pipeline: answer, critique, and rewrite, while enabling explicit attribution of each agent's marginal contribution to its partner's performance. ARE provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics