Contextual Thompson Sampling via Generation of Missing Data
Kelly W. Zhang, Tiffany Tianhui Cai, Hongseok Namkoong, Daniel Russo

TL;DR
This paper presents a novel contextual Thompson sampling framework that leverages generative models to impute missing data, enabling better uncertainty quantification and decision-making in bandit problems, with proven regret bounds.
Contribution
It introduces a generative model-based approach for TS in contextual bandits, providing a formal regret analysis that depends on offline prediction quality.
Findings
Regret bounds depend on generative model's offline prediction loss.
Algorithm effectively imputes missing outcomes for improved decision-making.
Framework achieves state-of-the-art regret guarantees.
Abstract
We introduce a framework for Thompson sampling (TS) contextual bandit algorithms, in which the algorithm's ability to quantify uncertainty and make decisions depends on the quality of a generative model that is learned offline. Instead of viewing uncertainty in the environment as arising from unobservable latent parameters, our algorithm treats uncertainty as stemming from missing, but potentially observable outcomes (including both future and counterfactual outcomes). If these outcomes were all observed, one could simply make decisions using an "oracle" policy fit on the complete dataset. Inspired by this conceptualization, at each decision-time, our algorithm uses a generative model to probabilistically impute missing outcomes, fits a policy using the imputed complete dataset, and uses that policy to select the next action. We formally show that this algorithm is a generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Bayesian Methods and Mixture Models · Machine Learning and Algorithms
