Bootstrapped Thompson Sampling and Deep Exploration

Ian Osband; Benjamin Van Roy

arXiv:1507.00300·stat.ML·July 2, 2015·60 cites

Bootstrapped Thompson Sampling and Deep Exploration

Ian Osband, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces a bootstrap-based exploration method for bandit and reinforcement learning problems that mimics Thompson sampling without requiring explicit posterior sampling, making it suitable for deep learning contexts.

Contribution

It proposes a novel bootstrap approach that induces a prior for effective exploration, bypassing the need for explicit posterior sampling in complex models.

Findings

01

Effective exploration achieved without explicit posterior sampling

02

Applicable to deep learning contexts with high computational costs

03

Demonstrates comparable performance to Thompson sampling

Abstract

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics