Improved Path-length Regret Bounds for Bandits

S\'ebastien Bubeck; Yuanzhi Li; Haipeng Luo; Chen-Yu Wei

arXiv:1901.10604·cs.LG·June 19, 2019·6 cites

Improved Path-length Regret Bounds for Bandits

S\'ebastien Bubeck, Yuanzhi Li, Haipeng Luo, Chen-Yu Wei

PDF

Open Access

TL;DR

This paper investigates adaptive regret bounds based on path-length for bandit problems, proving some bounds are optimal and introducing new algorithms that improve these bounds, extending results to linear bandits through novel reductions.

Contribution

The paper introduces new algorithms with improved path-length regret bounds for bandits, and extends these results to linear bandits via innovative reduction techniques.

Findings

01

Proved the optimality of certain path-length bounds for adaptive adversaries.

02

Developed algorithms that outperform previous bounds for both adversarial and oblivious settings.

03

Extended path-length regret bounds to linear bandits using reduction to convex body chasing.

Abstract

We study adaptive regret bounds in terms of the variation of the losses (the so-called path-length bounds) for both multi-armed bandit and more generally linear bandit. We first show that the seemingly suboptimal path-length bound of (Wei and Luo, 2018) is in fact not improvable for adaptive adversary. Despite this negative result, we then develop two new algorithms, one that strictly improves over (Wei and Luo, 2018) with a smaller path-length measure, and the other which improves over (Wei and Luo, 2018) for oblivious adversary when the path-length is large. Our algorithms are based on the well-studied optimistic mirror descent framework, but importantly with several novel techniques, including new optimistic predictions, a slight bias towards recently selected arms, and the use of a hybrid regularizer similar to that of (Bubeck et al., 2018). Furthermore, we extend our results to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics