An algorithm with nearly optimal pseudo-regret for both stochastic and   adversarial bandits

Peter Auer; Chao-Kai Chiang

arXiv:1605.08722·cs.LG·May 30, 2016·22 cites

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

Peter Auer, Chao-Kai Chiang

PDF

Open Access

TL;DR

This paper introduces a new algorithm that nearly optimally balances regret in both stochastic and adversarial bandit settings, providing strong theoretical guarantees and limitations.

Contribution

The paper proposes an algorithm with near-optimal pseudo-regret bounds for both stochastic and adversarial bandits, and establishes fundamental limitations of regret bounds.

Findings

01

Pseudo-regret against adversarial bandits is $O(K\sqrt{n \log n})$.

02

Pseudo-regret against stochastic bandits is $O(\sum_i (\log n)/\Delta_i)$.

03

No algorithm with $O(\log n)$ pseudo-regret can achieve $ ilde{O}(\sqrt{n})$ expected regret against adaptive adversaries.

Abstract

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is $O (K n lo g n)$ and against stochastic bandits the pseudo-regret is $O (\sum_{i} (lo g n) / Δ_{i})$ . We also show that no algorithm with $O (lo g n)$ pseudo-regret against stochastic bandits can achieve $\tilde{O} (n)$ expected regret against adaptive adversarial bandits. This complements previous results of Bubeck and Slivkins (2012) that show $\tilde{O} (n)$ expected adversarial regret with $O ((lo g n)^{2})$ stochastic pseudo-regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques