A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase; Shinji Ito; Idan Mehalel

arXiv:2511.00257·cs.LG·November 4, 2025

A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice

Zachary Chase, Shinji Ito, Idan Mehalel

PDF

Open Access

TL;DR

This paper establishes the exact minimax optimal expected regret for non-stochastic multi-armed bandits with expert advice, matching previous upper bounds and clarifying the fundamental limits of the problem.

Contribution

It provides a tight lower bound that confirms the minimax optimal regret rate, resolving a key theoretical question in the field.

Findings

01

Minimax optimal expected regret is Θ(√(T K log(N/K)))

02

Lower bound matches the previously known upper bound

03

Clarifies the fundamental limits of non-stochastic bandit with expert advice

Abstract

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $Θ (T K lo g (N / K))$ , where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications