Best Arm Identification with Minimal Regret
Junwen Yang, Vincent Y. F. Tan, Tianyuan Jin

TL;DR
This paper introduces a new variant of the multi-armed bandit problem that combines best arm identification with minimal regret, providing theoretical bounds and an optimal algorithm for this dual objective.
Contribution
It formulates BAI with minimal regret, establishes lower bounds, and proposes the Double KL-UCB algorithm achieving asymptotic optimality in this setting.
Findings
Established an instance-dependent lower bound on expected regret.
Proved an impossibility result linking regret and sample complexity.
Designed the Double KL-UCB algorithm with asymptotic optimality.
Abstract
Motivated by real-world applications that necessitate responsible experimentation, we introduce the problem of best arm identification (BAI) with minimal regret. This innovative variant of the multi-armed bandit problem elegantly amalgamates two of its most ubiquitous objectives: regret minimization and BAI. More precisely, the agent's goal is to identify the best arm with a prescribed confidence level , while minimizing the cumulative regret up to the stopping time. Focusing on single-parameter exponential families of distributions, we leverage information-theoretic techniques to establish an instance-dependent lower bound on the expected cumulative regret. Moreover, we present an intriguing impossibility result that underscores the tension between cumulative regret and sample complexity in fixed-confidence BAI. Complementarily, we design and analyze the Double KL-UCB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques
