Learning Explicit Credit Assignment for Cooperative Multi-Agent   Reinforcement Learning via Polarization Policy Gradient

Wubing Chen; Wenbin Li; Xiao Liu; Shangdong Yang; Yang Gao

arXiv:2210.05367·cs.LG·March 7, 2023·1 cites

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

Wubing Chen, Wenbin Li, Xiao Liu, Shangdong Yang, Yang Gao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MAPPG, a novel multi-agent policy gradient method that uses polarization to improve credit assignment, leading to convergence to the global optimum and superior performance in complex tasks.

Contribution

MAPPG employs a polarization function to effectively address the centralized-decentralized mismatch in MAPG, enabling better credit assignment and convergence guarantees.

Findings

01

MAPPG converges to the global optimum in matrix and differential games.

02

MAPPG outperforms state-of-the-art MAPG algorithms in StarCraft II tasks.

03

Theoretically proven convergence of individual policies to the global optimum.

Abstract

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as \textit{centralized-decentralized mismatch}. To address this issue, this paper presents a novel method, \textit{\underline{M}ulti-\underline{A}gent \underline{P}olarization \underline{P}olicy \underline{G}radient} (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPG. Theoretically, we prove that individual policies of MAPPG can converge to the global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

code-cultivater/MAPPG
pytorch

Videos

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient· underline

Taxonomy

TopicsReinforcement Learning in Robotics