Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning
Rapha\"el Avalos, Mathieu Reymond, Ann Now\'e, Diederik M. Roijers

TL;DR
This paper introduces Local Advantage Networks (LAN), a novel multi-agent reinforcement learning approach that uses a dueling architecture and centralized critic to improve scalability and performance in cooperative environments.
Contribution
LAN offers a new decentralized policy learning method with a centralized critic, differing from factorized value function approaches, and demonstrates state-of-the-art results on StarCraft II.
Findings
LAN achieves state-of-the-art performance on StarCraft II benchmark.
LAN is highly scalable with respect to the number of agents.
The centralized critic effectively stabilizes learning by reducing the moving target problem.
Abstract
Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is highly scalable with respect to the number of agents, opening up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control
MethodsConvolution · Q-Learning · Dense Connections · Deep Q-Network
