Ensemble Bootstrapping for Q-Learning

Oren Peer; Chen Tessler; Nadav Merlis; Ron Meir

arXiv:2103.00445·cs.LG·April 21, 2021·1 cites

Ensemble Bootstrapping for Q-Learning

Oren Peer, Chen Tessler, Nadav Merlis, Ron Meir

PDF

Open Access 1 Video

TL;DR

This paper introduces Ensemble Bootstrapped Q-Learning (EBQL), a new algorithm that reduces bias in Q-learning by extending double-Q-learning to ensembles, leading to improved performance in reinforcement learning tasks.

Contribution

The paper proposes EBQL, a novel ensemble-based extension of double-Q-learning, with theoretical bias reduction analysis and empirical validation on ATARI games.

Findings

01

EBQL yields lower mean squared error in estimating maximum means.

02

Both over- and under-estimation biases can degrade performance.

03

EBQL outperforms other deep Q-learning algorithms on ATARI games.

Abstract

Q-learning (QL), a common reinforcement learning algorithm, suffers from over-estimation bias due to the maximization term in the optimal Bellman operator. This bias may lead to sub-optimal behavior. Double-Q-learning tackles this issue by utilizing two estimators, yet results in an under-estimation bias. Similar to over-estimation in Q-learning, in certain scenarios, the under-estimation bias may degrade performance. In this work, we introduce a new bias-reduced algorithm called Ensemble Bootstrapped Q-Learning (EBQL), a natural extension of Double-Q-learning to ensembles. We analyze our method both theoretically and empirically. Theoretically, we prove that EBQL-like updates yield lower MSE when estimating the maximal mean of a set of independent random variables. Empirically, we show that there exist domains where both over and under-estimation result in sub-optimal performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Ensemble Bootstrapping for Q-Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control

MethodsQ-Learning