Semi-On-Policy Training for Sample Efficient Multi-Agent Policy   Gradients

Bozhidar Vasilev; Tarun Gupta; Bei Peng; Shimon Whiteson

arXiv:2104.13446·cs.LG·May 7, 2021·1 cites

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson

PDF

Open Access

TL;DR

This paper introduces semi-on-policy training to improve sample efficiency of multi-agent policy gradient methods, achieving better performance on SMAC benchmark compared to existing approaches.

Contribution

The paper proposes semi-on-policy training to enhance multi-agent policy gradient algorithms, bridging the performance gap with value-based methods.

Findings

01

Significant performance improvements on SMAC tasks.

02

Enhanced policy gradient algorithms outperform state-of-the-art value-based methods.

03

Semi-on-policy training reduces sample inefficiency.

Abstract

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios. However, there is a significant performance gap between state-of-the-art policy gradient and value-based methods on the popular StarCraft Multi-Agent Challenge (SMAC) benchmark. In this paper, we introduce semi-on-policy (SOP) training as an effective and computationally efficient way to address the sample inefficiency of on-policy policy gradient methods. We enhance two state-of-the-art policy gradient algorithms with SOP training, demonstrating significant performance improvements. Furthermore, we show that our methods perform as well or better than state-of-the-art value-based methods on a variety of SMAC tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research