Efficient Model-Based Reinforcement Learning Through Optimistic Thompson   Sampling

Jasmine Bayrooti; Carl Henrik Ek; Amanda Prorok

arXiv:2410.04988·cs.LG·March 12, 2025

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Jasmine Bayrooti, Carl Henrik Ek, Amanda Prorok

PDF

Open Access

TL;DR

This paper introduces a model-based reinforcement learning method using optimistic Thompson sampling that leverages joint uncertainty over transitions and rewards, leading to more efficient exploration and faster learning in complex control tasks.

Contribution

It presents the first model structure capable of reasoning about joint uncertainty over transitions and rewards, enabling theoretically grounded optimistic exploration.

Findings

01

Accelerates learning in sparse reward environments

02

Highlights the importance of model uncertainty in exploration

03

Effective in complex continuous control tasks

Abstract

Learning complex robot behavior through interactions with the environment necessitates principled exploration. Effective strategies should prioritize exploring regions of the state-action space that maximize rewards, with optimistic exploration emerging as a promising direction aligned with this idea and enabling sample-efficient reinforcement learning. However, existing methods overlook a crucial aspect: the need for optimism to be informed by a belief connecting the reward and state. To address this, we propose a practical, theoretically grounded approach to optimistic exploration based on Thompson sampling. Our model structure is the first that allows for reasoning about joint uncertainty over transitions and rewards. We apply our method on a set of MuJoCo and VMAS continuous control tasks. Our experiments demonstrate that optimistic exploration significantly accelerates learning in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research

MethodsSparse Evolutionary Training