Thompson Sampling with Diffusion Generative Prior
Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick, Bl\"obaum

TL;DR
This paper introduces a novel approach combining diffusion models with Thompson sampling to improve meta-learning in bandit problems, effectively learning priors from task distributions and handling noisy data.
Contribution
It proposes a new method that integrates diffusion generative models with Thompson sampling for meta-learning in bandits, including a training procedure for noisy/incomplete data.
Findings
Effective in learning task priors across bandit problems
Balances prior knowledge with noisy observations
Shows promising results in extensive experiments
Abstract
In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Domain Adaptation and Few-Shot Learning
MethodsTest · Diffusion
