Best Arm Identification in Stochastic Bandits: Beyond $\beta-$optimality
Arpan Mukherjee, Ali Tajer

TL;DR
This paper presents a new framework and algorithm for best arm identification in stochastic bandits that achieves optimal performance with computational efficiency, applicable to a broad class of distributions beyond exponential families.
Contribution
It introduces a novel approach that balances optimality and efficiency by sequentially estimating allocations with sufficient fidelity, applicable to general distribution families.
Findings
Achieves optimal sample complexity in BAI.
Maintains computational efficiency with decision rules.
Performs well across various distribution families.
Abstract
This paper investigates a hitherto unaddressed aspect of best arm identification (BAI) in stochastic multi-armed bandits in the fixed-confidence setting. Two key metrics for assessing bandit algorithms are computational efficiency and performance optimality (e.g., in sample complexity). In stochastic BAI literature, there have been advances in designing algorithms to achieve optimal performance, but they are generally computationally expensive to implement (e.g., optimization-based methods). There also exist approaches with high computational efficiency, but they have provable gaps to the optimal performance (e.g., the -optimal approaches in top-two methods). This paper introduces a framework and an algorithm for BAI that achieves optimal performance with a computationally efficient set of decision rules. The central process that facilitates this is a routine for sequentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Forecasting Techniques and Applications · Machine Learning and Algorithms
