Improving monotonic optimization in heterogeneous multi-agent reinforcement learning with optimal marginal deterministic policy gradient
Xiaoyang Yu, Youfang Lin, Shuo Wang, Sheng Han

TL;DR
This paper introduces the OMDPG algorithm for heterogeneous multi-agent reinforcement learning, effectively balancing monotonic improvement with partial parameter sharing through optimal marginal policies and a novel critic architecture.
Contribution
The paper proposes the OMDPG algorithm that combines optimal marginal Q-functions with a new critic and actor architecture to improve cooperative multi-agent learning.
Findings
OMDPG outperforms state-of-the-art MARL algorithms in SMAC and MAMuJoCo environments.
The proposed method maintains monotonic improvement while effectively using partial parameter sharing.
The new critic and actor design stabilizes training and enhances cooperative performance.
Abstract
In heterogeneous multi-agent reinforcement learning (MARL), achieving monotonic improvement plays a pivotal role in enhancing performance. The HAPPO algorithm proposes a feasible solution by introducing a sequential update scheme, which requires independent learning with No Parameter-sharing (NoPS). However, heterogeneous MARL generally requires Partial Parameter-sharing (ParPS) based on agent grouping to achieve high cooperative performance. Our experiments prove that directly combining ParPS with the sequential update scheme leads to the policy updating baseline drift problem, thereby failing to achieve improvement. To solve the conflict between monotonic improvement and ParPS, we propose the Optimal Marginal Deterministic Policy Gradient (OMDPG) algorithm. First, we replace the sequentially computed with the Optimal Marginal Q (OMQ) function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
