The best of both worlds: stochastic and adversarial bandits

Sebastien Bubeck; Aleksandrs Slivkins

arXiv:1202.4473·cs.LG·February 22, 2012·132 cites

The best of both worlds: stochastic and adversarial bandits

Sebastien Bubeck, Aleksandrs Slivkins

PDF

Open Access

TL;DR

This paper introduces SAO, a novel bandit algorithm that achieves near-optimal regret in both adversarial and stochastic reward settings, bridging a significant gap in multi-armed bandit research.

Contribution

SAO is the first algorithm to simultaneously optimize for both adversarial and stochastic rewards, combining the strengths of Exp3 and UCB1.

Findings

01

SAO achieves near-optimal regret in adversarial settings.

02

SAO attains logarithmic regret in stochastic settings.

03

The algorithm effectively adapts to different reward environments.

Abstract

We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal), whose regret is, essentially, optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the square-root worst-case regret of Exp3 (Auer et al., SIAM J. on Computing 2002) and the (poly)logarithmic regret of UCB1 (Auer et al., Machine Learning 2002) for stochastic rewards. Adversarial rewards and stochastic rewards are the two main settings in the literature on (non-Bayesian) multi-armed bandits. Prior work on multi-armed bandits treats them separately, and does not attempt to jointly optimize for both. Our result falls into a general theme of achieving good worst-case performance while also taking advantage of "nice" problem instances, an important issue in the design of algorithms with partially known inputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms