Counterfactual Multi-Agent Policy Gradients

Jakob Foerster; Gregory Farquhar; Triantafyllos Afouras; Nantas; Nardelli; Shimon Whiteson

arXiv:1705.08926·cs.AI·December 12, 2024·478 cites

Counterfactual Multi-Agent Policy Gradients

Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas, Nardelli, Shimon Whiteson

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces COMA, a multi-agent reinforcement learning method with a counterfactual baseline that improves decentralized policy learning in complex environments like StarCraft.

Contribution

The paper proposes COMA, a novel multi-agent actor-critic algorithm using a counterfactual baseline for efficient credit assignment and improved decentralized policy optimization.

Findings

01

COMA outperforms other multi-agent actor-critic methods in StarCraft micromanagement tasks.

02

COMA's performance is competitive with centralized controllers with full state access.

03

The counterfactual baseline effectively addresses multi-agent credit assignment challenges.

Abstract

Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Counterfactual Multi-Agent Policy Gradients· youtube

Taxonomy

TopicsFuel Cells and Related Materials · Reinforcement Learning in Robotics