On the Suboptimality of Thompson Sampling in High Dimensions

Raymond Zhang; Richard Combes

arXiv:2102.05502·stat.ML·October 22, 2021·1 cites

On the Suboptimality of Thompson Sampling in High Dimensions

Raymond Zhang, Richard Combes

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals that Thompson Sampling can perform poorly in high-dimensional combinatorial semi-bandit problems, with regret scaling exponentially or nearly linearly, and that adding forced exploration does not fix this issue.

Contribution

The paper demonstrates the sub-optimality of Thompson Sampling in high dimensions for combinatorial semi-bandits, highlighting its exponential regret growth and limitations of forced exploration.

Findings

01

Thompson Sampling's regret scales exponentially with dimension.

02

Forced exploration does not improve Thompson Sampling's performance.

03

Numerical results confirm poor practical performance in high dimensions.

Abstract

In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions, with Bernoulli distributed rewards and uniform priors. We also show that including a fixed amount of forced exploration to TS does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice TS indeed can perform very poorly in some high dimensional situations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RaymZhang/TS_Combinatorial_Semi_Bandits_Curse
noneOfficial

Videos

On the Suboptimality of Thompson Sampling in High Dimensions· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Advanced Causal Inference Techniques

MethodsSpatio-temporal stability analysis