Explore no more: Improved high-probability regret bounds for   non-stochastic bandits

Gergely Neu

arXiv:1506.03271·cs.LG·November 4, 2015·20 cites

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Gergely Neu

PDF

Open Access

TL;DR

This paper introduces a new analysis technique called Implicit eXploration (IX) that achieves high-probability regret bounds in non-stochastic bandits without the need for extensive uniform exploration, improving theoretical guarantees and robustness.

Contribution

It demonstrates that high-probability regret bounds can be obtained without the traditional uniform exploration requirement using the novel IX technique.

Findings

01

Achieved high-probability regret bounds without uniform exploration.

02

Derived improved bounds for various bandit extensions.

03

Experimental results show robustness of the IX method.

Abstract

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least $Ω (T)$ times over $T$ rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving high-probability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics