Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization
Aditya Kapoor, Benjamin Freed, Howie Choset, Jeff Schneider

TL;DR
This paper introduces PRD-MAPPO, a novel multi-agent reinforcement learning algorithm that improves credit assignment by dynamically decomposing agent groups using learned attention, leading to better efficiency and performance.
Contribution
It proposes partial reward decoupling with attention mechanisms to enhance credit assignment in MAPPO, including a version for shared reward scenarios.
Findings
PRD-MAPPO outperforms MAPPO and other methods in multi-agent tasks.
It improves data efficiency and asymptotic performance.
The shared reward version of PRD-MAPPO is effective.
Abstract
Multi-agent proximal policy optimization (MAPPO) has recently demonstrated state-of-the-art performance on challenging multi-agent reinforcement learning tasks. However, MAPPO still struggles with the credit assignment problem, wherein the sheer difficulty in ascribing credit to individual agents' actions scales poorly with team size. In this paper, we propose a multi-agent reinforcement learning algorithm that adapts recent developments in credit assignment to improve upon MAPPO. Our approach leverages partial reward decoupling (PRD), which uses a learned attention mechanism to estimate which of a particular agent's teammates are relevant to its learning updates. We use this estimate to dynamically decompose large groups of agents into smaller, more manageable subgroups. We empirically demonstrate that our approach, PRD-MAPPO, decouples agents from teammates that do not influence their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Efficiency Analysis Using DEA · Supply Chain and Inventory Management
MethodsSoftmax · Attention Is All You Need
