Off-Policy Multi-Agent Decomposed Policy Gradients
Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

TL;DR
This paper introduces DOP, a multi-agent decomposed policy gradient method that enhances off-policy learning and addresses key challenges in multi-agent reinforcement learning, significantly outperforming existing algorithms in complex environments.
Contribution
The paper presents DOP, a novel multi-agent decomposed policy gradient approach that integrates value function decomposition into actor-critic frameworks for improved performance.
Findings
DOP outperforms state-of-the-art algorithms on StarCraft II benchmarks.
DOP effectively supports off-policy learning in multi-agent settings.
Theoretical guarantees of convergence for DOP critics.
Abstract
Multi-agent policy gradient (MAPG) methods recently witness vigorous progress. However, there is a significant performance discrepancy between MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Traffic control and management
