Ensemble Bootstrapping for Q-Learning
Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

TL;DR
This paper introduces Ensemble Bootstrapped Q-Learning (EBQL), a new algorithm that reduces bias in Q-learning by extending double-Q-learning to ensembles, leading to improved performance in reinforcement learning tasks.
Contribution
The paper proposes EBQL, a novel ensemble-based extension of double-Q-learning, with theoretical bias reduction analysis and empirical validation on ATARI games.
Findings
EBQL yields lower mean squared error in estimating maximum means.
Both over- and under-estimation biases can degrade performance.
EBQL outperforms other deep Q-learning algorithms on ATARI games.
Abstract
Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control
MethodsQ-Learning
