Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

William R\'eveillard; Richard Combes

arXiv:2510.25811·stat.ML·October 31, 2025

Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

William R\'eveillard, Richard Combes

PDF

TL;DR

This paper introduces a new algorithm for multimodal stochastic bandit problems with at most m modes, achieving asymptotic optimality and providing the first computationally feasible solution to a key optimization problem.

Contribution

It presents the first computationally tractable algorithm for multimodal bandits that is asymptotically optimal, solving the Graves-Lai optimization problem.

Findings

01

The proposed algorithm is asymptotically optimal for multimodal bandits.

02

The paper provides a practical implementation for a previously intractable optimization problem.

03

Code for the algorithms is publicly available.

Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.