Thompson Sampling for Unimodal Bandits

Long Yang; Zhao Li; Zehong Hu; Shasha Ruan; Shijian Li; Gang Pan,; Hongyang Chen

arXiv:2106.08187·cs.LG·June 17, 2021

Thompson Sampling for Unimodal Bandits

Long Yang, Zhao Li, Zehong Hu, Shasha Ruan, Shijian Li, Gang Pan,, Hongyang Chen

PDF

Open Access

TL;DR

This paper introduces a Thompson Sampling algorithm tailored for unimodal bandits, leveraging the unimodal structure to achieve asymptotic optimality and improved regret bounds over standard methods.

Contribution

It proposes a novel Thompson Sampling approach that exploits unimodal structure, providing theoretical regret bounds and demonstrating superior empirical performance.

Findings

01

Achieves asymptotically optimal regret for Bernoulli rewards.

02

Attains logarithmic regret for Gaussian rewards, outperforming standard algorithms.

03

Shows effectiveness on synthetic and real-world data.

Abstract

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is $O (lo g T)$ , which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms