FACMAC: Factored Multi-Agent Centralised Policy Gradients
Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt,, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin B\"ohmer, Shimon, Whiteson

TL;DR
FACMAC introduces a factored critic with nonmonotonic capabilities and centralized policy gradients, significantly improving cooperative multi-agent reinforcement learning performance across various complex environments.
Contribution
It presents a novel factored critic with nonmonotonicity and a centralized policy gradient method, enhancing learning capacity and coordination in multi-agent settings.
Findings
FACMAC outperforms MADDPG and baselines on multiple benchmarks.
Nonmonotonic critic factorization enables solving complex tasks.
Centralized policy gradient improves coordination among agents.
Abstract
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Mosquito-borne diseases and control
MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · MADDPG
