Bootstrapped Thompson Sampling and Deep Exploration
Ian Osband, Benjamin Van Roy

TL;DR
This paper introduces a bootstrap-based exploration method for bandit and reinforcement learning problems that mimics Thompson sampling without requiring explicit posterior sampling, making it suitable for deep learning contexts.
Contribution
It proposes a novel bootstrap approach that induces a prior for effective exploration, bypassing the need for explicit posterior sampling in complex models.
Findings
Effective exploration achieved without explicit posterior sampling
Applicable to deep learning contexts with high computational costs
Demonstrates comparable performance to Thompson sampling
Abstract
This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
