A Maximum Mutual Information Framework for Multi-Agent Reinforcement   Learning

Woojun Kim; Whiyoung Jung; Myungsik Cho; Youngchul Sung

arXiv:2006.02732·cs.MA·June 5, 2020·6 cites

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung

PDF

Open Access

TL;DR

This paper introduces a novel maximum mutual information framework for multi-agent reinforcement learning, enabling agents to learn coordinated behaviors through mutual information regularization, and proposes a practical algorithm that outperforms existing methods in coordination tasks.

Contribution

The paper develops a new MMI-based regularization framework for MARL and introduces VM3-AC, a practical algorithm that improves coordination performance over prior algorithms.

Findings

01

VM3-AC outperforms MADDPG in coordination tasks.

02

The MMI framework effectively encourages coordinated behaviors.

03

The proposed algorithm demonstrates superior performance in multi-agent games.

Abstract

In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems