What You See May Not Be What You Get: UCB Bandit Algorithms Robust to {\epsilon}-Contamination
Laura Niss, Ambuj Tewari

TL;DR
This paper introduces robust UCB algorithms for stochastic bandits with -contaminated rewards, providing theoretical regret bounds and demonstrating superior performance in adversarial contamination scenarios, especially relevant for educational applications.
Contribution
The paper develops two robust UCB variants using new concentration inequalities for -contaminated rewards, achieving near-optimal regret bounds under adversarial contamination.
Findings
crUCB outperforms traditional stochastic and adversarial bandit algorithms in contaminated settings.
The proposed algorithms maintain low regret with small contamination proportions.
Simulations show crUCB's effectiveness even when contamination constraints are exceeded.
Abstract
Motivated by applications of bandit algorithms in education, we consider a stochastic multi-armed bandit problem with -contaminated rewards. We allow an adversary to give arbitrary unbounded contaminated rewards with full knowledge of the past and future. We impose the constraint that for each time the proportion of contaminated rewards for any action is less than or equal to . We derive concentration inequalities for two robust mean estimators for sub-Gaussian distributions in the -contamination context. We define the -contaminated stochastic bandit problem and use our robust mean estimators to give two variants of a robust Upper Confidence Bound (UCB) algorithm, crUCB. Using regret derived from only the underlying stochastic rewards, both variants of crUCB achieve regret for small enough…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
