SMIX($\lambda$): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning
Xinghu Yao, Chao Wen, Yuhui Wang, Xiaoyang Tan

TL;DR
SMIX(λ) introduces an off-policy training method for centralized value functions in multi-agent reinforcement learning, utilizing λ-return to improve stability and performance, demonstrated on the SMAC benchmark.
Contribution
It proposes SMIX(λ), a novel off-policy training approach that leverages λ-return for stable, efficient learning of centralized value functions in MARL, connecting to Q(λ) theory.
Findings
Outperforms state-of-the-art MARL methods on SMAC benchmark
Enhances performance of CTDE algorithms by improving CVFs
Shares convergence properties with Q(λ) approach
Abstract
Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This paper proposes an approach, named SMIX(), to address the issue using an efficient off-policy centralized training method within a flexible learner search space. As importance sampling for such off-policy training is both computationally costly and numerically unstable, we proposed to use the -return as a proxy to compute the TD error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the approach from an unified expectation correction viewpoint, we show that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Reinforcement Learning in Robotics
