Towards Tractable Optimism in Model-Based Reinforcement Learning

Aldo Pacchiano; Philip J. Ball; Jack Parker-Holder; Krzysztof; Choromanski; Stephen Roberts

arXiv:2006.11911·cs.LG·December 7, 2021·1 cites

Towards Tractable Optimism in Model-Based Reinforcement Learning

Aldo Pacchiano, Philip J. Ball, Jack Parker-Holder, Krzysztof, Choromanski, Stephen Roberts

PDF

Open Access

TL;DR

This paper introduces a scalable, noise-augmented approach to optimistic model-based reinforcement learning that balances optimism and estimation error, providing theoretical regret bounds and demonstrating effectiveness in deep RL tasks.

Contribution

It reinterprets scalable optimistic RL algorithms as solving a tractable noise-augmented MDP, deriving regret bounds and analyzing their performance in deep RL.

Findings

01

Regret bound of ( ||H\u007f ext{A} T ) with Gaussian noise.

02

Estimation error significantly impacts deep RL performance.

03

Reducing estimation error enables matching state-of-the-art results in continuous control.

Abstract

The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL). To be successful, an optimistic RL algorithm must over-estimate the true value function (optimism) but not by so much that it is inaccurate (estimation error). In the tabular setting, many state-of-the-art methods produce the required optimism through approaches which are intractable when scaling to deep RL. We re-interpret these scalable optimistic model-based algorithms as solving a tractable noise augmented MDP. This formulation achieves a competitive regret bound: $\tilde{O} (∣ S ∣ H ∣ A ∣ T)$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. We also explore how this trade-off changes in the deep RL setting, where we show empirically that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Model Reduction and Neural Networks