Towards Tractable Optimism in Model-Based Reinforcement Learning
Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof, Choromanski, Stephen Roberts

TL;DR
This paper introduces a scalable, noise-augmented approach to optimistic model-based reinforcement learning that balances optimism and estimation error, providing theoretical regret bounds and demonstrating effectiveness in deep RL tasks.
Contribution
It reinterprets scalable optimistic RL algorithms as solving a tractable noise-augmented MDP, deriving regret bounds and analyzing their performance in deep RL.
Findings
Regret bound of ( ||H\u007f ext{A} T ) with Gaussian noise.
Estimation error significantly impacts deep RL performance.
Reducing estimation error enables matching state-of-the-art results in continuous control.
Abstract
The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error). In the tabular setting, many state-of-the-art methods produce the required optimism through approaches which are intractable when scaling to deep RL. We re-interpret these scalable optimistic model-based algorithms as solving a tractable noise augmented MDP. This formulation achieves a competitive regret bound: when augmenting using Gaussian noise, where is the total number of environment steps. We also explore how this trade-off changes in the deep RL setting, where we show empirically that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Model Reduction and Neural Networks
