ME-IGM: Individual-Global-Max in Maximum Entropy Multi-Agent Reinforcement Learning

Wen-Tse Chen; Yuxuan Li; Shiyu Huang; Jiayu Chen; Jeff Schneider

arXiv:2406.13930·cs.LG·February 4, 2026·1 cites

ME-IGM: Individual-Global-Max in Maximum Entropy Multi-Agent Reinforcement Learning

Wen-Tse Chen, Yuxuan Li, Shiyu Huang, Jiayu Chen, Jeff Schneider

PDF

Open Access 1 Repo

TL;DR

This paper introduces ME-IGM, a maximum entropy multi-agent reinforcement learning algorithm that aligns local and global policies to improve exploration and credit assignment, demonstrating superior performance in complex cooperative tasks.

Contribution

The paper proposes a novel order-preserving transformation to address misalignment issues in maximum entropy MARL, enabling compatibility with any IGM-compliant credit assignment mechanism.

Findings

01

ME-IGM achieves state-of-the-art results in 17 scenarios.

02

Empirical evaluation confirms improved exploration and coordination.

03

Variants ME-QMIX and ME-QPLEX outperform existing methods.

Abstract

Multi-agent credit assignment is a fundamental challenge for cooperative multi-agent reinforcement learning (MARL), where a team of agents learn from shared reward signals. The Individual-Global-Max (IGM) condition is a widely used principle for multi-agent credit assignment, requiring that the joint action determined by individual Q-functions maximizes the global Q-value. Meanwhile, the principle of maximum entropy has been leveraged to enhance exploration in MARL. However, we identify a critical limitation in existing maximum entropy MARL methods: a misalignment arises between local policies and the joint policy that maximizes the global Q-value, leading to violations of the IGM condition. To address this misalignment, we propose an order-preserving transformation. Building on it, we introduce ME-IGM, a novel maximum entropy MARL algorithm compatible with any credit assignment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WentseChen/Soft-QMIX
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization · Advanced Control Systems Optimization · Iterative Learning Control Systems

MethodsALIGN