More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization
Jiangxing Wang, Deheng Ye, and Zongqing Lu

TL;DR
This paper introduces MACPF, a multi-agent reinforcement learning method that combines centralized training with decentralized execution by explicitly modeling agent dependencies, leading to improved performance.
Contribution
The paper proposes MACPF, a novel approach that allows for centralized training with dependency modeling among agents while enabling decentralized execution, addressing limitations of independence assumptions.
Findings
MACPF outperforms baselines in various cooperative MARL tasks.
MACPF achieves faster convergence compared to existing methods.
Theoretical analysis shows the possibility of deriving factorized policies from joint policies.
Abstract
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents to learn stochastic policies, which are more suitable for the partially observable environment. Given the goal of learning local policies that enable decentralized execution, agents are commonly assumed to be independent of each other, even in centralized training. However, such an assumption may prohibit agents from learning the optimal joint policy. To address this problem, we explicitly take the dependency among agents into centralized training. Although this leads to the optimal joint policy, it may not be factorized for decentralized execution. Nevertheless, we theoretically show that from such a joint policy, we can always derive another joint policy that achieves the same optimality but can be factorized for decentralized execution. To this end, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
