Optimality of Thompson Sampling with Noninformative Priors for Pareto   Bandits

Jongyeong Lee; Junya Honda; Chao-Kai Chiang; Masashi Sugiyama

arXiv:2302.01544·cs.LG·February 6, 2023·1 cites

Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama

PDF

Open Access

TL;DR

This paper investigates the optimality of Thompson sampling with noninformative priors for Pareto bandits, demonstrating conditions under which it achieves optimal regret bounds and highlighting the importance of prior choice and truncation procedures.

Contribution

It provides the first analysis of Thompson sampling's optimality for Pareto bandits, showing how prior selection and truncation affect regret bounds.

Findings

01

TS with certain priors achieves optimal regret bounds.

02

TS with Jeffreys and reference priors can be suboptimal without truncation.

03

Truncation procedures enable these priors to attain asymptotic lower bounds.

Abstract

In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models that belong to exponential families. In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters. Specifically, we discuss the optimality of TS with probability matching priors that include the Jeffreys prior and the reference priors. We first prove that TS with certain probability matching priors can achieve the optimal regret bound. Then, we show the suboptimality of TS with other priors, including the Jeffreys and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems

MethodsSpatio-temporal stability analysis