The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Chao Yu; Akash Velu; Eugene Vinitsky; Jiaxuan Gao; Yu Wang; Alexandre; Bayen; Yi Wu

arXiv:2103.01955·cs.LG·November 7, 2022·591 cites

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre, Bayen, Yi Wu

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper demonstrates that Proximal Policy Optimization (PPO), a simple on-policy reinforcement learning algorithm, performs surprisingly well in cooperative multi-agent environments, often rivaling or surpassing off-policy methods with minimal tuning.

Contribution

The study reveals that PPO can be an effective baseline for cooperative multi-agent reinforcement learning, challenging the belief that it is less sample efficient than off-policy algorithms.

Findings

01

PPO achieves strong performance across four multi-agent testbeds.

02

PPO often matches or exceeds off-policy methods in sample efficiency.

03

Implementation and hyperparameter choices are crucial for PPO's success.

Abstract

Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, Google Research Football, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Sports Analytics and Performance

MethodsEntropy Regularization · Proximal Policy Optimization