Robustness of Anytime Bandit Policies
Antoine Salomon, Jean-Yves Audibert

TL;DR
This paper investigates the robustness of anytime bandit policies, showing that most lack logarithmic regret guarantees and proposing new robust policies under certain distribution restrictions.
Contribution
It extends the negative results on regret bounds to all anytime policies and introduces robust policies for specific bandit problem classes.
Findings
Most anytime policies do not guarantee logarithmic regret.
The paper proves that the negative result applies broadly to all such policies.
New robust policies are proposed for bandit problems with restricted arm distributions.
Abstract
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Optimization and Search Problems
