Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient
Wubing Chen, Wenbin Li, Xiao Liu, Shangdong Yang, Yang Gao

TL;DR
This paper introduces MAPPG, a novel multi-agent policy gradient method that uses polarization to improve credit assignment, leading to convergence to the global optimum and superior performance in complex tasks.
Contribution
MAPPG employs a polarization function to effectively address the centralized-decentralized mismatch in MAPG, enabling better credit assignment and convergence guarantees.
Findings
MAPPG converges to the global optimum in matrix and differential games.
MAPPG outperforms state-of-the-art MAPG algorithms in StarCraft II tasks.
Theoretically proven convergence of individual policies to the global optimum.
Abstract
Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as \textit{centralized-decentralized mismatch}. To address this issue, this paper presents a novel method, \textit{\underline{M}ulti-\underline{A}gent \underline{P}olarization \underline{P}olicy \underline{G}radient} (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPG. Theoretically, we prove that individual policies of MAPPG can converge to the global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
