BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings
Karine Karine, Susan A. Murphy, Benjamin M. Marlin

TL;DR
This paper introduces BOTS, a batch Bayesian optimization method that extends Thompson sampling for reinforcement learning in severely episode-limited settings, improving policy performance with fewer episodes.
Contribution
It develops an extended Thompson sampling approach with learned action bias terms via batch Bayesian optimization, enabling better policies in limited-episode RL scenarios.
Findings
Outperforms standard Thompson sampling in total return
Requires fewer episodes than value function and policy gradient methods
Effective in behavioral dynamics simulation environment
Abstract
In settings where the application of reinforcement learning (RL) requires running real-world trials, including the optimization of adaptive health interventions, the number of episodes available for learning can be severely limited due to cost or time constraints. In this setting, the bias-variance trade-off of contextual bandit methods can be significantly better than that of more complex full RL methods. However, Thompson sampling bandits are limited to selecting actions based on distributions of immediate rewards. In this paper, we extend the linear Thompson sampling bandit to select actions based on a state-action utility function consisting of the Thompson sampler's estimate of the expected immediate reward combined with an action bias term. We use batch Bayesian optimization over episodes to learn the action bias terms with the goal of maximizing the expected return of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPneumonia and Respiratory Infections · Orthopedic Infections and Treatments · Anomaly Detection Techniques and Applications
