Off-Policy Multi-Agent Decomposed Policy Gradients

Yihan Wang; Beining Han; Tonghan Wang; Heng Dong; Chongjie Zhang

arXiv:2007.12322·cs.LG·October 6, 2020·42 cites

Off-Policy Multi-Agent Decomposed Policy Gradients

Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces DOP, a multi-agent decomposed policy gradient method that enhances off-policy learning and addresses key challenges in multi-agent reinforcement learning, significantly outperforming existing algorithms in complex environments.

Contribution

The paper presents DOP, a novel multi-agent decomposed policy gradient approach that integrates value function decomposition into actor-critic frameworks for improved performance.

Findings

01

DOP outperforms state-of-the-art algorithms on StarCraft II benchmarks.

02

DOP effectively supports off-policy learning in multi-agent settings.

03

Theoretical guarantees of convergence for DOP critics.

Abstract

Multi-agent policy gradient (MAPG) methods recently witness vigorous progress. However, there is a significant performance discrepancy between MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TonghanWang/DOP
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Traffic control and management