JointPPO: Diving Deeper into the Effectiveness of PPO in Multi-Agent Reinforcement Learning
Chenxing Liu, Guizhong Liu

TL;DR
JointPPO introduces a Transformer-based centralized training method for multi-agent reinforcement learning, effectively managing large joint action spaces and outperforming existing baselines in complex environments.
Contribution
It proposes a novel CTCE approach using PPO with a Transformer policy network, transforming joint decision-making into a sequence generation task for better scalability.
Findings
JointPPO outperforms strong baselines on SMAC benchmark.
Transformer-based policy effectively handles large joint action spaces.
Ablation studies highlight key factors influencing performance.
Abstract
While Centralized Training with Decentralized Execution (CTDE) has become the prevailing paradigm in Multi-Agent Reinforcement Learning (MARL), it may not be suitable for scenarios in which agents can fully communicate and share observations with each other. Fully centralized methods, also know as Centralized Training with Centralized Execution (CTCE) methods, can fully utilize observations of all the agents by treating the entire system as a single agent. However, traditional CTCE methods suffer from scalability issues due to the exponential growth of the joint action space. To address these challenges, in this paper we propose JointPPO, a CTCE method that uses Proximal Policy Optimization (PPO) to directly optimize the joint policy of the multi-agent system. JointPPO decomposes the joint policy into conditional probabilities, transforming the decision-making process into a sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Advanced Software Engineering Methodologies
