Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Jasmine Bayrooti, Carl Henrik Ek, Amanda Prorok

TL;DR
This paper introduces a model-based reinforcement learning method using optimistic Thompson sampling that leverages joint uncertainty over transitions and rewards, leading to more efficient exploration and faster learning in complex control tasks.
Contribution
It presents the first model structure capable of reasoning about joint uncertainty over transitions and rewards, enabling theoretically grounded optimistic exploration.
Findings
Accelerates learning in sparse reward environments
Highlights the importance of model uncertainty in exploration
Effective in complex continuous control tasks
Abstract
Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our model structure is the first that allows for reasoning about joint uncertainty over transitions and rewards. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
MethodsSparse Evolutionary Training
