Multiagent Soft Q-Learning

Ermo Wei; Drew Wicke; David Freelan; Sean Luke

arXiv:1804.09817·cs.AI·April 27, 2018·45 cites

Multiagent Soft Q-Learning

Ermo Wei, Drew Wicke, David Freelan, Sean Luke

PDF

Open Access

TL;DR

This paper introduces Multiagent Soft Q-learning, a novel approach that improves coordination in continuous multiagent reinforcement learning tasks by overcoming local optima issues inherent in policy gradient methods.

Contribution

The paper proposes Multiagent Soft Q-learning, addressing the relative overgeneralization problem and demonstrating superior performance over MADDPG in cooperative tasks.

Findings

01

Achieves better coordination in multiagent cooperative tasks

02

Converges to better local optima in joint action space

03

Outperforms MADDPG in experimental comparisons

Abstract

Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning

MethodsWeight Decay · Convolution · Adam · Experience Replay · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · MADDPG · Q-Learning