Thompson Sampling with Diffusion Generative Prior

Yu-Guan Hsieh; Shiva Prasad Kasiviswanathan; Branislav Kveton; Patrick; Bl\"obaum

arXiv:2301.05182·cs.LG·January 31, 2023

Thompson Sampling with Diffusion Generative Prior

Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick, Bl\"obaum

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel approach combining diffusion models with Thompson sampling to improve meta-learning in bandit problems, effectively learning priors from task distributions and handling noisy data.

Contribution

It proposes a new method that integrates diffusion generative models with Thompson sampling for meta-learning in bandits, including a training procedure for noisy/incomplete data.

Findings

01

Effective in learning task priors across bandit problems

02

Balances prior knowledge with noisy observations

03

Shows promising results in extensive experiments

Abstract

In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Thompson Sampling with Diffusion Generative Prior· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Domain Adaptation and Few-Shot Learning

MethodsTest · Diffusion