Optimistic Multi-Agent Policy Gradient

Wenshuai Zhao; Yi Zhao; Zhiyuan Li; Juho Kannala; Joni Pajarinen

arXiv:2311.01953·cs.LG·October 15, 2025·1 cites

Optimistic Multi-Agent Policy Gradient

Wenshuai Zhao, Yi Zhao, Zhiyuan Li, Juho Kannala, Joni Pajarinen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple framework for optimistic updates in multi-agent policy gradient methods to address the relative overgeneralization problem, improving performance on various cooperative tasks.

Contribution

It proposes a clipping-based advantage modification to enable optimistic updates, preventing premature convergence to suboptimal policies in multi-agent learning.

Findings

01

Outperforms baselines on 13 out of 19 tasks

02

Retains optimality at fixed points

03

Effective on Multi-agent MuJoCo and Overcooked benchmarks

Abstract

*Relative overgeneralization* (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents. No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods although these methods produce state-of-the-art results. To address this gap, we propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem. Our approach involves clipping the advantage to eliminate negative values, thereby facilitating optimistic updates in MAPG. The optimism prevents individual agents from quickly converging to a local optimum. Additionally, we provide a formal analysis to show that the proposed method retains optimality at a fixed point. In extensive evaluations on a diverse set of tasks including the *Multi-agent MuJoCo*…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenshuaizhao/optimappo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning

MethodsSparse Evolutionary Training