The Uncertainty Bellman Equation and Exploration
Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

TL;DR
This paper introduces an uncertainty Bellman equation (UBE) that propagates uncertainty in reinforcement learning, providing a scalable exploration method that improves performance on Atari games.
Contribution
It formulates a novel uncertainty Bellman equation whose fixed point bounds posterior variance, enabling scalable and effective exploration in complex RL environments.
Findings
UBE yields a tighter variance bound than count-based bonuses.
Replacing epsilon-greedy with UBE improves DQN performance on Atari.
The method scales well to large, complex systems.
Abstract
We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Reinforcement Learning in Robotics · Reservoir Engineering and Simulation Methods
