FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation   with Parameter-Sharing Versatility

Lang Feng; Dong Xing; Junru Zhang; Gang Pan

arXiv:2310.05053·cs.LG·October 10, 2023

FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

Lang Feng, Dong Xing, Junru Zhang, Gang Pan

PDF

Open Access

TL;DR

This paper introduces FP3O, a versatile multi-agent PPO algorithm that supports various parameter-sharing schemes in cooperative MARL, backed by a solid theoretical foundation and superior empirical performance.

Contribution

We propose a novel full-pipeline paradigm enabling multi-agent PPO to be compatible with diverse parameter-sharing methods, enhancing flexibility and theoretical guarantees.

Findings

01

FP3O outperforms baseline algorithms on Multi-Agent MuJoCo and StarCraftII tasks.

02

FP3O demonstrates high versatility across different parameter-sharing configurations.

03

Theoretical analysis confirms policy improvement guarantees for FP3O.

Abstract

Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsEntropy Regularization · Proximal Policy Optimization