Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Richard Combes; Alexandre Proutiere

arXiv:1405.5096·cs.LG·May 21, 2014·42 cites

Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms

Richard Combes, Alexandre Proutiere

PDF

Open Access

TL;DR

This paper studies unimodal stochastic bandits, deriving regret lower bounds and proposing an optimal algorithm, OSUB, that exploits unimodal structure to achieve minimal regret, independent of the number of arms.

Contribution

It introduces OSUB, an algorithm that matches the regret lower bounds for discrete unimodal bandits and demonstrates its optimality and efficiency.

Findings

01

OSUB matches the asymptotic regret lower bounds.

02

Regret of OSUB does not depend on the number of arms.

03

Discretization combined with UCB is effective for continuous unimodal bandits.

Abstract

We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong to a bounded interval. For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its asymptotic regret does not depend on the number of arms. We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Age of Information Optimization