Deep Reinforcement Learning with Weighted Q-Learning

Andrea Cini; Carlo D'Eramo; Jan Peters; Cesare Alippi

arXiv:2003.09280·cs.LG·June 14, 2022·1 cites

Deep Reinforcement Learning with Weighted Q-Learning

Andrea Cini, Carlo D'Eramo, Jan Peters, Cesare Alippi

PDF

Open Access

TL;DR

This paper introduces Deep Weighted Q-Learning, a method that reduces bias in deep reinforcement learning by estimating action value uncertainties using Dropout-based Bayesian methods, leading to improved performance.

Contribution

It extends Weighted Q-Learning to deep neural networks by using Dropout as an approximation of Bayesian inference to better estimate uncertainties and reduce overestimation bias in DRL.

Findings

01

Reduces bias compared to baseline algorithms.

02

Improves performance on benchmark tasks.

03

Provides more reliable action value estimates.

Abstract

Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values. Systematic overestimation of the action values coupled with the inherently high variance of DRL methods can lead to incrementally accumulate errors, causing learning algorithms to diverge. Ideally, we would like DRL agents to take into account their own uncertainty about the optimality of each action, and be able to exploit it to make more informed estimations of the expected return. In this regard, Weighted Q-Learning (WQL) effectively reduces bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action values,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference

MethodsConcrete Dropout · Q-Learning · Dropout