QPLEX: Duplex Dueling Multi-Agent Q-Learning
Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang

TL;DR
QPLEX introduces a duplex dueling network architecture for multi-agent Q-learning that enforces the IGM principle, leading to improved scalability, stability, and performance in complex multi-agent environments.
Contribution
It presents a novel duplex dueling network architecture that encodes the IGM principle into the neural network, enabling scalable and stable multi-agent reinforcement learning.
Findings
QPLEX outperforms state-of-the-art methods on StarCraft II tasks.
It achieves high sample efficiency and benefits from offline datasets.
Theoretical analysis confirms the complete IGM function class.
Abstract
We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE has an important concept, Individual-Global-Max (IGM) principle, which requires the consistency between joint and local action selections to support efficient local decision-making. However, in order to achieve scalability, existing MARL methods either limit representation expressiveness of their value function classes or relax the IGM consistency, which may suffer from instability risk or may not perform well in complex domains. This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), which takes a duplex dueling network architecture to factorize the joint value function. This duplex dueling structure encodes the IGM principle into the neural network architecture and thus enables efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Game Theory and Cooperation · Experimental Behavioral Economics Studies
MethodsDense Connections · Double Q-learning · Q-Learning · Convolution · Dueling Network
