SMIX($\lambda$): Enhancing Centralized Value Functions for Cooperative   Multi-Agent Reinforcement Learning

Xinghu Yao; Chao Wen; Yuhui Wang; Xiaoyang Tan

arXiv:1911.04094·cs.MA·August 11, 2020

SMIX($\lambda$): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning

Xinghu Yao, Chao Wen, Yuhui Wang, Xiaoyang Tan

PDF

Open Access 1 Repo

TL;DR

SMIX(λ) introduces an off-policy training method for centralized value functions in multi-agent reinforcement learning, utilizing λ-return to improve stability and performance, demonstrated on the SMAC benchmark.

Contribution

It proposes SMIX(λ), a novel off-policy training approach that leverages λ-return for stable, efficient learning of centralized value functions in MARL, connecting to Q(λ) theory.

Findings

01

Outperforms state-of-the-art MARL methods on SMAC benchmark

02

Enhances performance of CTDE algorithms by improving CVFs

03

Shares convergence properties with Q(λ) approach

Abstract

Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multi-agent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This paper proposes an approach, named SMIX( $λ$ ), to address the issue using an efficient off-policy centralized training method within a flexible learner search space. As importance sampling for such off-policy training is both computationally costly and numerically unstable, we proposed to use the $λ$ -return as a proxy to compute the TD error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the $Q (λ)$ approach from an unified expectation correction viewpoint, we show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chaovven/SMIX
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Reinforcement Learning in Robotics