The Uncertainty Bellman Equation and Exploration

Brendan O'Donoghue; Ian Osband; Remi Munos; Volodymyr Mnih

arXiv:1709.05380·cs.AI·October 23, 2018·59 cites

The Uncertainty Bellman Equation and Exploration

Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

PDF

Open Access 1 Repo

TL;DR

This paper introduces an uncertainty Bellman equation (UBE) that propagates uncertainty in reinforcement learning, providing a scalable exploration method that improves performance on Atari games.

Contribution

It formulates a novel uncertainty Bellman equation whose fixed point bounds posterior variance, enabling scalable and effective exploration in complex RL environments.

Findings

01

UBE yields a tighter variance bound than count-based bonuses.

02

Replacing epsilon-greedy with UBE improves DQN performance on Atari.

03

The method scales well to large, complex systems.

Abstract

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stratismarkou/sample-efficient-bayesian-rl
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Reinforcement Learning in Robotics · Reservoir Engineering and Simulation Methods