PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement   Learning

Kun Hu; Muning Wen; Xihuai Wang; Shao Zhang; Yiwei Shi; Minne Li,; Minglong Li; Ying Wen

arXiv:2502.16496·cs.LG·February 25, 2025

PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning

Kun Hu, Muning Wen, Xihuai Wang, Shao Zhang, Yiwei Shi, Minne Li,, Minglong Li, Ying Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces PMAT, a novel multi-agent reinforcement learning algorithm that optimizes agent decision order using Plackett-Luce sampling, significantly improving coordination efficiency in complex multi-agent environments.

Contribution

The paper proposes AGPS for decision order optimization and integrates it into PMAT, advancing sequential decision-making in MARL with better dependency management.

Findings

01

PMAT outperforms state-of-the-art algorithms on benchmarks.

02

AGPS effectively manages agent decision order.

03

Enhanced coordination efficiency demonstrated in experiments.

Abstract

Multi-agent reinforcement learning (MARL) faces challenges in coordinating agents due to complex interdependencies within multi-agent systems. Most MARL algorithms use the simultaneous decision-making paradigm but ignore the action-level dependencies among agents, which reduces coordination efficiency. In contrast, the sequential decision-making paradigm provides finer-grained supervision for agent decision order, presenting the potential for handling dependencies via better decision order management. However, determining the optimal decision order remains a challenge. In this paper, we introduce Action Generation with Plackett-Luce Sampling (AGPS), a novel mechanism for agent decision order optimization. We model the order determination task as a Plackett-Luce sampling process to address issues such as ranking instability and vanishing gradient during the network training process. AGPS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nudt-bi-marl/pmat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer