A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning
Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung

TL;DR
This paper introduces a novel maximum mutual information framework for multi-agent reinforcement learning, enabling agents to learn coordinated behaviors through mutual information regularization, and proposes a practical algorithm that outperforms existing methods in coordination tasks.
Contribution
The paper develops a new MMI-based regularization framework for MARL and introduces VM3-AC, a practical algorithm that improves coordination performance over prior algorithms.
Findings
VM3-AC outperforms MADDPG in coordination tasks.
The MMI framework effectively encourages coordinated behaviors.
The proposed algorithm demonstrates superior performance in multi-agent games.
Abstract
In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Distributed Control Multi-Agent Systems
