Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Yaodong Yang; Jianye Hao; Guangyong Chen; Hongyao Tang; Yingfeng Chen,; Yujing Hu; Changjie Fan; Zhongyu Wei

arXiv:2002.03950·cs.MA·February 11, 2020·26 cites

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning

Yaodong Yang, Jianye Hao, Guangyong Chen, Hongyao Tang, Yingfeng Chen,, Yujing Hu, Changjie Fan, Zhongyu Wei

PDF

Open Access 1 Video

TL;DR

This paper introduces Q-value Path Decomposition (QPD), a novel method for multiagent credit assignment in deep MARL, using integrated gradients to decompose global Q-values for improved coordination.

Contribution

QPD leverages integrated gradient attribution to directly decompose global Q-values into individual agent Q-values, enhancing multiagent credit assignment in deep MARL.

Findings

01

QPD achieves state-of-the-art performance on StarCraft II tasks.

02

QPD improves coordination in both homogeneous and heterogeneous multiagent scenarios.

03

The method outperforms existing cooperative MARL algorithms.

Abstract

Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Q-value Path Decomposition for Deep Multiagent Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Security and Resilience · Fault Detection and Control Systems