Heterogeneous Multi-Agent Reinforcement Learning via Mirror Descent Policy Optimization
Mohammad Mehdi Nasiri, Mansoor Rezghi

TL;DR
This paper introduces HAMDPO, a novel mirror descent-based algorithm for heterogeneous multi-agent reinforcement learning that improves policy stability and performance across diverse tasks.
Contribution
It extends mirror descent methods to heterogeneous MARL, enabling efficient, stable policy updates for agents with different capabilities and action spaces.
Findings
HAMDPO outperforms HATRPO and HAPPO on MuJoCo and StarCraftII tasks.
The algorithm effectively handles both continuous and discrete action spaces.
Results demonstrate improved stability and performance in cooperative MARL settings.
Abstract
This paper presents an extension of the Mirror Descent method to overcome challenges in cooperative Multi-Agent Reinforcement Learning (MARL) settings, where agents have varying abilities and individual policies. The proposed Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm utilizes the multi-agent advantage decomposition lemma to enable efficient policy updates for each agent while ensuring overall performance improvements. By iteratively updating agent policies through an approximate solution of the trust-region problem, HAMDPO guarantees stability and improves performance. Moreover, the HAMDPO algorithm is capable of handling both continuous and discrete action spaces for heterogeneous agents in various MARL problems. We evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks, demonstrating its superiority over state-of-the-art algorithms such as HATRPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
