A Tight Lower Bound for Non-stochastic Multi-armed Bandits with Expert Advice
Zachary Chase, Shinji Ito, Idan Mehalel

TL;DR
This paper establishes the exact minimax optimal expected regret for non-stochastic multi-armed bandits with expert advice, matching previous upper bounds and clarifying the fundamental limits of the problem.
Contribution
It provides a tight lower bound that confirms the minimax optimal regret rate, resolving a key theoretical question in the field.
Findings
Minimax optimal expected regret is Θ(√(T K log(N/K)))
Lower bound matches the previously known upper bound
Clarifies the fundamental limits of non-stochastic bandit with expert advice
Abstract
We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be , where is the number of arms, is the number of experts, and is the time horizon.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications
