Fixed-Confidence Best Arm Identification with Decreasing Variance
Tamojeet Roychowdhury, Kota Srinivas Reddy, Krishna P Jagannathan,, Sharayu Moharir

TL;DR
This paper addresses best-arm identification in stochastic bandits with decreasing reward variances, proposing two novel policies that balance sampling and waiting, outperforming existing methods.
Contribution
It introduces two new policies tailored for decreasing variance scenarios, with analytical guarantees and improved performance over state-of-the-art algorithms.
Findings
Proposed policies outperform classical methods in simulations.
Analytical guarantees established for both policies.
Effective handling of decreasing variance in reward distributions.
Abstract
We focus on the problem of best-arm identification in a stochastic multi-arm bandit with temporally decreasing variances for the arms' rewards. We model arm rewards as Gaussian random variables with fixed means and variances that decrease with time. The cost incurred by the learner is modeled as a weighted sum of the time needed by the learner to identify the best arm, and the number of samples of arms collected by the learner before termination. Under this cost function, there is an incentive for the learner to not sample arms in all rounds, especially in the initial rounds. On the other hand, not sampling increases the termination time of the learner, which also increases cost. This trade-off necessitates new sampling strategies. We propose two policies. The first policy has an initial wait period with no sampling followed by continuous sampling. The second policy samples periodically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction
MethodsFocus
