Thompson Exploration with Best Challenger Rule in Best Arm Identification
Jongyeong Lee, Junya Honda, Masashi Sugiyama

TL;DR
This paper introduces a novel bandit policy combining Thompson sampling with the best challenger rule, achieving asymptotic optimality and computational efficiency in best arm identification tasks.
Contribution
It proposes a new policy that leverages Thompson sampling with the best challenger rule, improving computational efficiency and asymptotic optimality in BAI.
Findings
Asymptotically optimal for two-armed bandits.
Near optimal for multi-armed bandits with K ≥ 3.
Competitive sample complexity with less computation.
Abstract
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving an optimization problem at every round and/or are forced to explore an arm at least a certain number of times except those restricted to the Gaussian model. To address these limitations, we propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule. While Thompson sampling was originally considered for maximizing the cumulative reward, we demonstrate that it can be used to naturally explore arms in BAI without forcing it. We show that our policy is asymptotically optimal for any two-armed bandit problems and achieves near optimality for general -armed bandit problems for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image and Object Detection Techniques
