Q-value Path Decomposition for Deep Multiagent Reinforcement Learning
Yaodong Yang, Jianye Hao, Guangyong Chen, Hongyao Tang, Yingfeng Chen,, Yujing Hu, Changjie Fan, Zhongyu Wei

TL;DR
This paper introduces Q-value Path Decomposition (QPD), a novel method for multiagent credit assignment in deep MARL, using integrated gradients to decompose global Q-values for improved coordination.
Contribution
QPD leverages integrated gradient attribution to directly decompose global Q-values into individual agent Q-values, enhancing multiagent credit assignment in deep MARL.
Findings
QPD achieves state-of-the-art performance on StarCraft II tasks.
QPD improves coordination in both homogeneous and heterogeneous multiagent scenarios.
The method outperforms existing cooperative MARL algorithms.
Abstract
Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Security and Resilience · Fault Detection and Control Systems
