Policy Diversity for Cooperative Agents
Mingxi Tan, Andong Tian, Ludovic Denoyer

TL;DR
This paper introduces a novel method called Moment-Matching Policy Diversity for generating diverse cooperative policies in multi-agent reinforcement learning, addressing the lack of existing approaches tailored for multi-agent domains.
Contribution
It proposes a simple, theoretically grounded approach to produce significantly different team policies by regularizing trajectory distribution differences using maximum mean discrepancy.
Findings
Effective in generating diverse policies for cooperative multi-agent tasks
Demonstrated success on a challenging team-based shooter environment
Provides a constrained optimization framework for policy diversity
Abstract
Standard cooperative multi-agent reinforcement learning (MARL) methods aim to find the optimal team cooperative policy to complete a task. However there may exist multiple different ways of cooperating, which usually are very needed by domain experts. Therefore, identifying a set of significantly different policies can alleviate the task complexity for them. Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain. In this work, we propose a method called Moment-Matching Policy Diversity to alleviate this problem. This method can generate different team policies to varying degrees by formalizing the difference between team policies as the difference in actions of selected agents in different policies. Theoretically, we show that our method is a simple way to implement a constrained optimization problem that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Software Engineering Research
