Factors of Influence of the Overestimation Bias of Q-Learning

Julius Wagenbach; Matthia Sabatelli

arXiv:2210.05262·stat.ML·October 12, 2022

Factors of Influence of the Overestimation Bias of Q-Learning

Julius Wagenbach, Matthia Sabatelli

PDF

Open Access 1 Repo

TL;DR

This paper investigates how learning rate, discount factor, and reward signals affect overestimation bias in Q-Learning, demonstrating that parameter tuning and reward smoothing can improve value estimate accuracy.

Contribution

It identifies key parameters influencing overestimation bias and proposes a method using reward averaging to enhance Q-Learning's accuracy beyond existing approaches.

Findings

01

All three parameters significantly influence overestimation.

02

Careful tuning of parameters reduces bias.

03

Reward smoothing improves value estimate accuracy.

Abstract

We study whether the learning rate $α$ , the discount factor $γ$ and the reward signal $r$ have an influence on the overestimation bias of the Q-Learning algorithm. Our preliminary results in environments which are stochastic and that require the use of neural networks as function approximators, show that all three parameters influence overestimation significantly. By carefully tuning $α$ and $γ$ , and by using an exponential moving average of $r$ in Q-Learning's temporal difference target, we show that the algorithm can learn value estimates that are more accurate than the ones of several other popular model-free methods that have addressed its overestimation bias in the past.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

overestimationbias/factors-of-influence-of-the-overestimation-bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Algorithms · Fault Detection and Control Systems

MethodsQ-Learning