Preventing Value Function Collapse in Ensemble {Q}-Learning by Maximizing Representation Diversity
Hassam Ullah Sheikh, Ladislau B\"ol\"oni

TL;DR
This paper introduces a regularization method to enhance diversity among ensemble Q-learning models, effectively reducing overestimation bias and improving performance over existing ensemble techniques.
Contribution
It proposes a novel regularization technique inspired by economics and consensus optimization to prevent ensemble collapse in Q-learning.
Findings
Regularization significantly improves ensemble diversity.
Enhanced ensemble methods outperform Maxmin and Ensemble Q-learning.
The approach reduces overestimation bias effectively.
Abstract
The classic DQN algorithm is limited by the overestimation bias of the learned Q-function. Subsequent algorithms have proposed techniques to reduce this problem, without fully eliminating it. Recently, the Maxmin and Ensemble Q-learning algorithms have used different estimates provided by the ensembles of learners to reduce the overestimation bias. Unfortunately, these learners can converge to the same point in the parametric or representation space, falling back to the classic single neural network DQN. In this paper, we describe a regularization technique to maximize ensemble diversity in these algorithms. We propose and compare five regularization functions inspired from economics theory and consensus optimization. We show that the regularized approach significantly outperforms the Maxmin and Ensemble Q-learning algorithms as well as non-ensemble baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Reinforcement Learning in Robotics
MethodsConvolution · Dense Connections · Deep Q-Network · Q-Learning
