Rate-optimal Bayesian Simple Regret in Best Arm Identification

Junpei Komiyama; Kaito Ariu; Masahiro Kato; Chao Qin

arXiv:2111.09885·cs.LG·July 27, 2023·1 cites

Rate-optimal Bayesian Simple Regret in Best Arm Identification

Junpei Komiyama, Kaito Ariu, Masahiro Kato, Chao Qin

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the rate at which Bayesian simple regret decreases in best arm identification for multi-armed bandits, proposing an optimal algorithm with theoretical guarantees and supporting simulations.

Contribution

It characterizes the Bayesian simple regret rate under certain priors and introduces a simple, near-optimal algorithm for best arm identification.

Findings

01

The leading term in Bayesian simple regret is from arms with small gaps.

02

The proposed algorithm's regret matches the lower bound up to a constant.

03

Simulation results confirm theoretical predictions.

Abstract

We consider best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the leading term in the Bayesian simple regret derives from the region where the gap between optimal and suboptimal arms is smaller than $\frac{l o g T}{T}$ . We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor; simulation results support our theoretical findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jkomiyama/bayesbai_paper
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics