Improving monotonic optimization in heterogeneous multi-agent reinforcement learning with optimal marginal deterministic policy gradient

Xiaoyang Yu; Youfang Lin; Shuo Wang; Sheng Han

arXiv:2507.09989·cs.AI·July 15, 2025

Improving monotonic optimization in heterogeneous multi-agent reinforcement learning with optimal marginal deterministic policy gradient

Xiaoyang Yu, Youfang Lin, Shuo Wang, Sheng Han

PDF

Open Access

TL;DR

This paper introduces the OMDPG algorithm for heterogeneous multi-agent reinforcement learning, effectively balancing monotonic improvement with partial parameter sharing through optimal marginal policies and a novel critic architecture.

Contribution

The paper proposes the OMDPG algorithm that combines optimal marginal Q-functions with a new critic and actor architecture to improve cooperative multi-agent learning.

Findings

01

OMDPG outperforms state-of-the-art MARL algorithms in SMAC and MAMuJoCo environments.

02

The proposed method maintains monotonic improvement while effectively using partial parameter sharing.

03

The new critic and actor design stabilizes training and enhances cooperative performance.

Abstract

In heterogeneous multi-agent reinforcement learning (MARL), achieving monotonic improvement plays a pivotal role in enhancing performance. The HAPPO algorithm proposes a feasible solution by introducing a sequential update scheme, which requires independent learning with No Parameter-sharing (NoPS). However, heterogeneous MARL generally requires Partial Parameter-sharing (ParPS) based on agent grouping to achieve high cooperative performance. Our experiments prove that directly combining ParPS with the sequential update scheme leads to the policy updating baseline drift problem, thereby failing to achieve improvement. To solve the conflict between monotonic improvement and ParPS, we propose the Optimal Marginal Deterministic Policy Gradient (OMDPG) algorithm. First, we replace the sequentially computed $Q_{ψ}^{s} (s, a_{1 : i})$ with the Optimal Marginal Q (OMQ) function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics