Deep Reinforcement Learning with Weighted Q-Learning
Andrea Cini, Carlo D'Eramo, Jan Peters, Cesare Alippi

TL;DR
This paper introduces Deep Weighted Q-Learning, a method that reduces bias in deep reinforcement learning by estimating action value uncertainties using Dropout-based Bayesian methods, leading to improved performance.
Contribution
It extends Weighted Q-Learning to deep neural networks by using Dropout as an approximation of Bayesian inference to better estimate uncertainties and reduce overestimation bias in DRL.
Findings
Reduces bias compared to baseline algorithms.
Improves performance on benchmark tasks.
Provides more reliable action value estimates.
Abstract
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values. Systematic overestimation of the action values coupled with the inherently high variance of DRL methods can lead to incrementally accumulate errors, causing learning algorithms to diverge. Ideally, we would like DRL agents to take into account their own uncertainty about the optimality of each action, and be able to exploit it to make more informed estimations of the expected return. In this regard, Weighted Q-Learning (WQL) effectively reduces bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action values,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
MethodsConcrete Dropout · Q-Learning · Dropout
