Thompson Sampling For Combinatorial Bandits: Polynomial Regret and   Mismatched Sampling Paradox

Raymond Zhang; Richard Combes

arXiv:2410.05441·stat.ML·October 10, 2024

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox

Raymond Zhang, Richard Combes

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel Thompson Sampling algorithm for linear combinatorial semi-bandits with polynomial regret bounds, and uncovers a paradox where incorrect sampling can outperform correct posterior sampling.

Contribution

It presents the first Thompson Sampling method with finite-time regret not exponential in dimension and reveals the mismatched sampling paradox in bandit algorithms.

Findings

01

Thompson Sampling achieves polynomial regret in combinatorial bandits.

02

Incorrectly matched sampling can outperform correct posterior sampling.

03

Code for experiments is publicly available.

Abstract

We consider Thompson Sampling (TS) for linear combinatorial semi-bandits and subgaussian rewards. We propose the first known TS whose finite-time regret does not scale exponentially with the dimension of the problem. We further show the "mismatched sampling paradox": A learner who knows the rewards distributions and samples from the correct posterior distribution can perform exponentially worse than a learner who does not know the rewards and simply samples from a well-chosen Gaussian posterior. The code used to generate the experiments is available at https://github.com/RaymZhang/CTS-Mismatched-Paradox

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raymzhang/cts-mismatched-paradox
noneOfficial

Videos

Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling Paradox· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Sparse and Compressive Sensing Techniques

MethodsSpatio-temporal stability analysis