On the Suboptimality of Thompson Sampling in High Dimensions
Raymond Zhang, Richard Combes

TL;DR
This paper reveals that Thompson Sampling can perform poorly in high-dimensional combinatorial semi-bandit problems, with regret scaling exponentially or nearly linearly, and that adding forced exploration does not fix this issue.
Contribution
The paper demonstrates the sub-optimality of Thompson Sampling in high dimensions for combinatorial semi-bandits, highlighting its exponential regret growth and limitations of forced exploration.
Findings
Thompson Sampling's regret scales exponentially with dimension.
Forced exploration does not improve Thompson Sampling's performance.
Numerical results confirm poor practical performance in high dimensions.
Abstract
In this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions, with Bernoulli distributed rewards and uniform priors. We also show that including a fixed amount of forced exploration to TS does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice TS indeed can perform very poorly in some high dimensional situations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing · Advanced Causal Inference Techniques
MethodsSpatio-temporal stability analysis
