More Centralized Training, Still Decentralized Execution: Multi-Agent   Conditional Policy Factorization

Jiangxing Wang; Deheng Ye; and Zongqing Lu

arXiv:2209.12681·cs.LG·February 13, 2023·5 cites

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

Jiangxing Wang, Deheng Ye, and Zongqing Lu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MACPF, a multi-agent reinforcement learning method that combines centralized training with decentralized execution by explicitly modeling agent dependencies, leading to improved performance.

Contribution

The paper proposes MACPF, a novel approach that allows for centralized training with dependency modeling among agents while enabling decentralized execution, addressing limitations of independence assumptions.

Findings

01

MACPF outperforms baselines in various cooperative MARL tasks.

02

MACPF achieves faster convergence compared to existing methods.

03

Theoretical analysis shows the possibility of deriving factorized policies from joint policies.

Abstract

In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents to learn stochastic policies, which are more suitable for the partially observable environment. Given the goal of learning local policies that enable decentralized execution, agents are commonly assumed to be independent of each other, even in centralized training. However, such an assumption may prohibit agents from learning the optimal joint policy. To address this problem, we explicitly take the dependency among agents into centralized training. Although this leads to the optimal joint policy, it may not be factorized for decentralized execution. Nevertheless, we theoretically show that from such a joint policy, we can always derive another joint policy that achieves the same optimality but can be factorized for decentralized execution. To this end, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-rl/fop-dmac-macpf
pytorchOfficial

Videos

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics