FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility
Lang Feng, Dong Xing, Junru Zhang, Gang Pan

TL;DR
This paper introduces FP3O, a versatile multi-agent PPO algorithm that supports various parameter-sharing schemes in cooperative MARL, backed by a solid theoretical foundation and superior empirical performance.
Contribution
We propose a novel full-pipeline paradigm enabling multi-agent PPO to be compatible with diverse parameter-sharing methods, enhancing flexibility and theoretical guarantees.
Findings
FP3O outperforms baseline algorithms on Multi-Agent MuJoCo and StarCraftII tasks.
FP3O demonstrates high versatility across different parameter-sharing configurations.
Theoretical analysis confirms policy improvement guarantees for FP3O.
Abstract
Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsEntropy Regularization · Proximal Policy Optimization
