An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits
Peter Auer, Chao-Kai Chiang

TL;DR
This paper introduces a new algorithm that nearly optimally balances regret in both stochastic and adversarial bandit settings, providing strong theoretical guarantees and limitations.
Contribution
The paper proposes an algorithm with near-optimal pseudo-regret bounds for both stochastic and adversarial bandits, and establishes fundamental limitations of regret bounds.
Findings
Pseudo-regret against adversarial bandits is $O(K\sqrt{n \log n})$.
Pseudo-regret against stochastic bandits is $O(\sum_i (\log n)/\Delta_i)$.
No algorithm with $O(\log n)$ pseudo-regret can achieve $ ilde{O}(\sqrt{n})$ expected regret against adaptive adversaries.
Abstract
We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits. Against adversarial bandits the pseudo-regret is and against stochastic bandits the pseudo-regret is . We also show that no algorithm with pseudo-regret against stochastic bandits can achieve expected regret against adaptive adversarial bandits. This complements previous results of Bubeck and Slivkins (2012) that show expected adversarial regret with stochastic pseudo-regret.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Data Stream Mining Techniques
