Multiagent Soft Q-Learning
Ermo Wei, Drew Wicke, David Freelan, Sean Luke

TL;DR
This paper introduces Multiagent Soft Q-learning, a novel approach that improves coordination in continuous multiagent reinforcement learning tasks by overcoming local optima issues inherent in policy gradient methods.
Contribution
The paper proposes Multiagent Soft Q-learning, addressing the relative overgeneralization problem and demonstrating superior performance over MADDPG in cooperative tasks.
Findings
Achieves better coordination in multiagent cooperative tasks
Converges to better local optima in joint action space
Outperforms MADDPG in experimental comparisons
Abstract
Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
MethodsWeight Decay · Convolution · Adam · Experience Replay · Dense Connections · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · MADDPG · Q-Learning
