Mean-based Best Arm Identification in Stochastic Bandits under Reward   Contamination

Arpan Mukherjee; Ali Tajer; Pin-Yu Chen; Payel Das

arXiv:2111.07458·cs.LG·November 16, 2021

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

Arpan Mukherjee, Ali Tajer, Pin-Yu Chen, Payel Das

PDF

Open Access 1 Video

TL;DR

This paper addresses best arm identification in contaminated stochastic bandits, proposing two algorithms that achieve asymptotically optimal sample complexity despite adversarial reward contamination.

Contribution

It introduces two novel algorithms for contaminated bandits that attain asymptotic optimality in error guarantees and sample complexity.

Findings

01

Algorithms outperform existing baselines in numerical experiments.

02

Proposed methods achieve asymptotic optimal error guarantees.

03

Sample complexity is optimal up to constant and logarithmic factors.

Abstract

This paper investigates the problem of best arm identification in $contaminated$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $ε$ . A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Machine Learning and Algorithms