MCMARL: Parameterizing Value Function via Mixture of Categorical   Distributions for Multi-Agent Reinforcement Learning

Jian Zhao; Mingyu Yang; Youpeng Zhao; Xunhan Hu; Wengang Zhou,; Jiangcheng Zhu; Houqiang Li

arXiv:2202.10134·cs.LG·May 23, 2022

MCMARL: Parameterizing Value Function via Mixture of Categorical Distributions for Multi-Agent Reinforcement Learning

Jian Zhao, Mingyu Yang, Youpeng Zhao, Xunhan Hu, Wengang Zhou,, Jiangcheng Zhu, Houqiang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MCMARL, a novel multi-agent reinforcement learning framework that models value functions as mixtures of categorical distributions to better capture the stochasticity in long-term returns, improving decision-making in complex environments.

Contribution

It proposes a distributional approach to value function parameterization in MARL, extending existing methods with categorical distributions and proving their consistency with the DIGM principle.

Findings

01

MCMARL effectively models stochastic returns in multi-agent tasks.

02

The framework outperforms expectation-based methods in StarCraft II micromanagement.

03

Distributional modeling improves the robustness of value estimates.

Abstract

In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns and the randomness can be exacerbated with the increasing number of agents. However, such randomness is ignored by most of the existing value-based multi-agent reinforcement learning (MARL) methods, which only model the expectation of Q-value for both individual agents and the team. Compared to using the expectations of the long-term returns, it is preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this work proposes a novel value-based MARL framework from a distributional perspective, \emph{i.e.}, parameterizing value function via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wudiymy/dqmix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExperimental Behavioral Economics Studies · Innovation Diffusion and Forecasting · Evolutionary Game Theory and Cooperation