Optimistic Multi-Agent Policy Gradient
Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen

TL;DR
This paper introduces a simple framework for optimistic updates in multi-agent policy gradient methods to address the relative overgeneralization problem, improving performance on various cooperative tasks.
Contribution
It proposes a clipping-based advantage modification to enable optimistic updates, preventing premature convergence to suboptimal policies in multi-agent learning.
Findings
Outperforms baselines on 13 out of 19 tasks
Retains optimality at fixed points
Effective on Multi-agent MuJoCo and Overcooked benchmarks
Abstract
*Relative overgeneralization* (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents. No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods although these methods produce state-of-the-art results. To address this gap, we propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem. Our approach involves clipping the advantage to eliminate negative values, thereby facilitating optimistic updates in MAPG. The optimism prevents individual agents from quickly converging to a local optimum. Additionally, we provide a formal analysis to show that the proposed method retains optimality at a fixed point. In extensive evaluations on a diverse set of tasks including the *Multi-agent MuJoCo*…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
MethodsSparse Evolutionary Training
