Taming the Noise in Reinforcement Learning via Soft Updates

Roy Fox; Ari Pakman; Naftali Tishby

arXiv:1512.08562·cs.LG·February 1, 2018·67 cites

Taming the Noise in Reinforcement Learning via Soft Updates

Roy Fox, Ari Pakman, Naftali Tishby

PDF

Open Access 3 Repos

TL;DR

G-learning is a novel off-policy reinforcement learning algorithm that reduces bias and accelerates convergence in noisy environments by regularizing value estimates and incorporating prior knowledge.

Contribution

The paper introduces G-learning, a new off-policy algorithm that penalizes deterministic policies early on, reducing bias and improving learning speed in noisy settings.

Findings

01

G-learning achieves faster convergence compared to traditional methods.

02

It effectively incorporates prior domain knowledge.

03

Results show reduced exploration costs and improved learning efficiency.

Abstract

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research