Rate-optimal Bayesian Simple Regret in Best Arm Identification
Junpei Komiyama, Kaito Ariu, Masahiro Kato, Chao Qin

TL;DR
This paper analyzes the rate at which Bayesian simple regret decreases in best arm identification for multi-armed bandits, proposing an optimal algorithm with theoretical guarantees and supporting simulations.
Contribution
It characterizes the Bayesian simple regret rate under certain priors and introduces a simple, near-optimal algorithm for best arm identification.
Findings
The leading term in Bayesian simple regret is from arms with small gaps.
The proposed algorithm's regret matches the lower bound up to a constant.
Simulation results confirm theoretical predictions.
Abstract
We consider best arm identification in the multi-armed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization (Lai, 1987), the leading term in the Bayesian simple regret derives from the region where the gap between optimal and suboptimal arms is smaller than . We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor; simulation results support our theoretical findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
