Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown

Emile Anand; Sarah Liaw

arXiv:2507.15290·cs.LG·October 27, 2025

Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown

Emile Anand, Sarah Liaw

PDF

1 Video

TL;DR

This paper systematically evaluates Feel-Good Thompson Sampling (FG-TS) and its variants across various benchmarks, demonstrating their strengths and limitations in high-dimensional and approximate posterior settings, and recommending them as strong baselines.

Contribution

First comprehensive benchmarking of FG-TS and SFG-TS in real-world and synthetic contextual bandit problems, including approximate posterior regimes.

Findings

01

FG-TS outperforms vanilla TS in linear and logistic bandits.

02

Larger bonuses improve performance with accurate posteriors but hinder with noisy sampling.

03

FG-TS is less effective in neural bandits but remains a competitive, easy-to-use baseline.

Abstract

Thompson Sampling (TS) is widely used to address the exploration/exploitation tradeoff in contextual bandits, yet recent theory shows that it does not explore aggressively enough in high-dimensional problems. Feel-Good Thompson Sampling (FG-TS) addresses this by adding an optimism bonus that biases toward high-reward models, and it achieves the asymptotically minimax-optimal regret in the linear setting when posteriors are exact. However, its performance with \emph{approximate} posteriors -- common in large-scale or neural problems -- has not been benchmarked. We provide the first systematic study of FG-TS and its smoothed variant (SFG-TS) across eleven real-world and synthetic benchmarks. To evaluate their robustness, we compare performance across settings with exact posteriors (linear and logistic bandits) to approximate regimes produced by fast but coarse stochastic-gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown· slideslive