Improved Regret Bounds for Bandits with Expert Advice

Nicol\`o Cesa-Bianchi; Khaled Eldowa; Emmanuel Esposito; Julia; Olkhovskaya

arXiv:2406.16802·cs.LG·June 25, 2024

Improved Regret Bounds for Bandits with Expert Advice

Nicol\`o Cesa-Bianchi, Khaled Eldowa, Emmanuel Esposito, Julia, Olkhovskaya

PDF

Open Access

TL;DR

This paper establishes tight regret bounds for the bandits with expert advice problem under various feedback models, improving theoretical understanding of the problem's difficulty and performance limits.

Contribution

It provides new lower bounds under restricted feedback and an improved upper bound for standard feedback, advancing the theoretical analysis of bandit algorithms with expert advice.

Findings

01

Lower bound of order √(K T ln(N/K)) matches known upper bounds.

02

New instance-based upper bound depends on expert agreement.

03

Improves previous regret bounds for bandits with expert advice.

Abstract

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $K T ln (N / K)$ for the worst-case regret, where $K$ is the number of actions, $N > K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $K T (ln N) / (ln K)$ . For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Healthcare Operations and Scheduling Optimization