Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
Richard Combes, Alexandre Proutiere

TL;DR
This paper studies unimodal stochastic bandits, deriving regret lower bounds and proposing an optimal algorithm, OSUB, that exploits unimodal structure to achieve minimal regret, independent of the number of arms.
Contribution
It introduces OSUB, an algorithm that matches the regret lower bounds for discrete unimodal bandits and demonstrates its optimality and efficiency.
Findings
OSUB matches the asymptotic regret lower bounds.
Regret of OSUB does not depend on the number of arms.
Discretization combined with UCB is effective for continuous unimodal bandits.
Abstract
We consider stochastic multi-armed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in (Cope 2009, Yu 2011). The set of arms is either discrete, in which case arms correspond to the vertices of a finite graph whose structure represents similarity in rewards, or continuous, in which case arms belong to a bounded interval. For discrete unimodal bandits, we derive asymptotic lower bounds for the regret achieved under any algorithm, and propose OSUB, an algorithm whose regret matches this lower bound. Our algorithm optimally exploits the unimodal structure of the problem, and surprisingly, its asymptotic regret does not depend on the number of arms. We also provide a regret upper bound for OSUB in non-stationary environments where the expected rewards smoothly evolve over time. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Age of Information Optimization
