Improved Regret Bounds for Bandits with Expert Advice
Nicol\`o Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia, Olkhovskaya

TL;DR
This paper establishes tight regret bounds for the bandits with expert advice problem under various feedback models, improving theoretical understanding of the problem's difficulty and performance limits.
Contribution
It provides new lower bounds under restricted feedback and an improved upper bound for standard feedback, advancing the theoretical analysis of bandit algorithms with expert advice.
Findings
Lower bound of order √(K T ln(N/K)) matches known upper bounds.
New instance-based upper bound depends on expert agreement.
Improves previous regret bounds for bandits with expert advice.
Abstract
In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order for the worst-case regret, where is the number of actions, the number of experts, and the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of . For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Healthcare Operations and Scheduling Optimization
