Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination
Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das

TL;DR
This paper addresses best arm identification in contaminated stochastic bandits, proposing two algorithms that achieve asymptotically optimal sample complexity despite adversarial reward contamination.
Contribution
It introduces two novel algorithms for contaminated bandits that attain asymptotic optimality in error guarantees and sample complexity.
Findings
Algorithms outperform existing baselines in numerical experiments.
Proposed methods achieve asymptotic optimal error guarantees.
Sample complexity is optimal up to constant and logarithmic factors.
Abstract
This paper investigates the problem of best arm identification in stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability . A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms
