Robustness of Anytime Bandit Policies

Antoine Salomon; Jean-Yves Audibert

arXiv:1107.4506·stat.ML·July 26, 2011

Robustness of Anytime Bandit Policies

Antoine Salomon, Jean-Yves Audibert

PDF

Open Access

TL;DR

This paper investigates the robustness of anytime bandit policies, showing that most lack logarithmic regret guarantees and proposing new robust policies under certain distribution restrictions.

Contribution

It extends the negative results on regret bounds to all anytime policies and introduces robust policies for specific bandit problem classes.

Findings

01

Most anytime policies do not guarantee logarithmic regret.

02

The paper proves that the negative result applies broadly to all such policies.

03

New robust policies are proposed for bandit problems with restricted arm distributions.

Abstract

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Optimization and Search Problems