Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling
Benjamin Freed, Aditya Kapoor, Ian Abraham, Jeff Schneider, Howie, Choset

TL;DR
This paper introduces partial reward decoupling (PRD), a method that decomposes multi-agent RL problems into subproblems to improve credit assignment, data efficiency, and stability in cooperative settings.
Contribution
The paper proposes PRD, a novel approach that simplifies credit assignment in multi-agent RL by decomposing problems, leading to better performance than existing methods like COMA.
Findings
PRD reduces variance in policy gradient estimates.
PRD improves data efficiency and learning stability.
PRD outperforms COMA in various multi-agent tasks.
Abstract
One of the preeminent obstacles to scaling multi-agent reinforcement learning to large numbers of agents is assigning credit to individual agents' actions. In this paper, we address this credit assignment problem with an approach that we call \textit{partial reward decoupling} (PRD), which attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment. We empirically demonstrate that decomposing the RL problem using PRD in an actor-critic algorithm results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared to various other actor-critic approaches. Additionally, we relate our approach to counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
