FACMAC: Factored Multi-Agent Centralised Policy Gradients

Bei Peng; Tabish Rashid; Christian A. Schroeder de Witt,; Pierre-Alexandre Kamienny; Philip H. S. Torr; Wendelin B\"ohmer; Shimon; Whiteson

arXiv:2003.06709·cs.LG·May 10, 2021·106 cites

FACMAC: Factored Multi-Agent Centralised Policy Gradients

Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt,, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin B\"ohmer, Shimon, Whiteson

PDF

Open Access 3 Repos 1 Video

TL;DR

FACMAC introduces a factored critic with nonmonotonic capabilities and centralized policy gradients, significantly improving cooperative multi-agent reinforcement learning performance across various complex environments.

Contribution

It presents a novel factored critic with nonmonotonicity and a centralized policy gradient method, enhancing learning capacity and coordination in multi-agent settings.

Findings

01

FACMAC outperforms MADDPG and baselines on multiple benchmarks.

02

Nonmonotonic critic factorization enables solving complex tasks.

03

Centralized policy gradient improves coordination among agents.

Abstract

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

FACMAC: Factored Multi-Agent Centralised Policy Gradients· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Mosquito-borne diseases and control

MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · MADDPG