Counterfactual Multi-Agent Policy Gradients
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas, Nardelli, Shimon Whiteson

TL;DR
This paper introduces COMA, a multi-agent reinforcement learning method with a counterfactual baseline that improves decentralized policy learning in complex environments like StarCraft.
Contribution
The paper proposes COMA, a novel multi-agent actor-critic algorithm using a counterfactual baseline for efficient credit assignment and improved decentralized policy optimization.
Findings
COMA outperforms other multi-agent actor-critic methods in StarCraft micromanagement tasks.
COMA's performance is competitive with centralized controllers with full state access.
The counterfactual baseline effectively addresses multi-agent credit assignment challenges.
Abstract
Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Counterfactual Multi-Agent Policy Gradients· youtube
Taxonomy
TopicsFuel Cells and Related Materials · Reinforcement Learning in Robotics
