Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits
Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama

TL;DR
This paper investigates the optimality of Thompson sampling with noninformative priors for Pareto bandits, demonstrating conditions under which it achieves optimal regret bounds and highlighting the importance of prior choice and truncation procedures.
Contribution
It provides the first analysis of Thompson sampling's optimality for Pareto bandits, showing how prior selection and truncation affect regret bounds.
Findings
TS with certain priors achieves optimal regret bounds.
TS with Jeffreys and reference priors can be suboptimal without truncation.
Truncation procedures enable these priors to attain asymptotic lower bounds.
Abstract
In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models that belong to exponential families. In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. Specifically, we discuss the optimality of TS with probability matching priors that include the Jeffreys prior and the reference priors. We first prove that TS with certain probability matching priors can achieve the optimal regret bound. Then, we show the suboptimality of TS with other priors, including the Jeffreys and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
MethodsSpatio-temporal stability analysis
