Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms
William R\'eveillard, Richard Combes

TL;DR
This paper introduces a new algorithm for multimodal stochastic bandit problems with at most m modes, achieving asymptotic optimality and providing the first computationally feasible solution to a key optimization problem.
Contribution
It presents the first computationally tractable algorithm for multimodal bandits that is asymptotically optimal, solving the Graves-Lai optimization problem.
Findings
The proposed algorithm is asymptotically optimal for multimodal bandits.
The paper provides a practical implementation for a previously intractable optimization problem.
Code for the algorithms is publicly available.
Abstract
We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
