ME-IGM: Individual-Global-Max in Maximum Entropy Multi-Agent Reinforcement Learning
Wen-Tse Chen, Yuxuan Li, Shiyu Huang, Jiayu Chen, Jeff Schneider

TL;DR
This paper introduces ME-IGM, a maximum entropy multi-agent reinforcement learning algorithm that aligns local and global policies to improve exploration and credit assignment, demonstrating superior performance in complex cooperative tasks.
Contribution
The paper proposes a novel order-preserving transformation to address misalignment issues in maximum entropy MARL, enabling compatibility with any IGM-compliant credit assignment mechanism.
Findings
ME-IGM achieves state-of-the-art results in 17 scenarios.
Empirical evaluation confirms improved exploration and coordination.
Variants ME-QMIX and ME-QPLEX outperform existing methods.
Abstract
Multi-agent credit assignment is a fundamental challenge for cooperative multi-agent reinforcement learning (MARL), where a team of agents learn from shared reward signals. The Individual-Global-Max (IGM) condition is a widely used principle for multi-agent credit assignment, requiring that the joint action determined by individual Q-functions maximizes the global Q-value. Meanwhile, the principle of maximum entropy has been leveraged to enhance exploration in MARL. However, we identify a critical limitation in existing maximum entropy MARL methods: a misalignment arises between local policies and the joint policy that maximizes the global Q-value, leading to violations of the IGM condition. To address this misalignment, we propose an order-preserving transformation. Building on it, we introduce ME-IGM, a novel maximum entropy MARL algorithm compatible with any credit assignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Advanced Control Systems Optimization · Iterative Learning Control Systems
MethodsALIGN
