Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer
Yaodong Yang, Guangyong Chen, Hongyao Tang, Furui Liu, Danruo Deng,, and Pheng Ann Heng

TL;DR
This paper introduces a novel multiagent Q-learning algorithm that reduces overestimation by combining ensemble methods with a hypernet regularizer, improving stability and performance in multiagent environments.
Contribution
It proposes a dual ensemble approach with a hypernet regularizer to address overestimation in both target estimation and online optimization in multiagent Q-learning.
Findings
Effectively reduces overestimation in multiagent Q-learning.
Improves stability and performance across multiple multiagent benchmarks.
Demonstrates superior results compared to existing methods.
Abstract
Overestimation in single-agent reinforcement learning has been extensively studied. In contrast, overestimation in the multiagent setting has received comparatively little attention although it increases with the number of agents and leads to severe learning instability. Previous works concentrate on reducing overestimation in the estimation process of target Q-value. They ignore the follow-up optimization process of online Q-network, thus making it hard to fully address the complex multiagent overestimation problem. To solve this challenge, in this study, we first establish an iterative estimation-optimization analysis framework for multiagent value-mixing Q-learning. Our analysis reveals that multiagent overestimation not only comes from the computation of target Q-value but also accumulates in the online Q-network's optimization. Motivated by it, we propose the Dual Ensembled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Neural Networks and Applications
MethodsSoftmax · Attention Is All You Need · HyperNetwork · Q-Learning
