The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
Chao Yu, Akash Velu, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre, Bayen, Yi Wu

TL;DR
This paper demonstrates that Proximal Policy Optimization (PPO), a simple on-policy reinforcement learning algorithm, performs surprisingly well in cooperative multi-agent environments, often rivaling or surpassing off-policy methods with minimal tuning.
Contribution
The study reveals that PPO can be an effective baseline for cooperative multi-agent reinforcement learning, challenging the belief that it is less sample efficient than off-policy algorithms.
Findings
PPO achieves strong performance across four multi-agent testbeds.
PPO often matches or exceeds off-policy methods in sample efficiency.
Implementation and hyperparameter choices are crucial for PPO's success.
Abstract
Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, Google Research Football, and the Hanabi challenge, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Sports Analytics and Performance
MethodsEntropy Regularization · Proximal Policy Optimization
