Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback

Ofir Schlisselberg; Tal Lancewicki; Peter Auer; Yishay Mansour

arXiv:2505.24193·cs.LG·October 21, 2025

Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback

Ofir Schlisselberg, Tal Lancewicki, Peter Auer, Yishay Mansour

PDF

Open Access 1 Video

TL;DR

This paper introduces a new algorithm for multi-armed bandits with delayed feedback that nearly matches the best possible regret bounds in both stochastic and adversarial environments, improving upon prior methods.

Contribution

A novel algorithm that achieves near-optimal regret bounds in both stochastic and adversarial bandit settings with delays, matching known lower bounds up to logarithmic factors.

Findings

01

Achieves adversarial regret of (\u221a{KT} + D)

02

Provides stochastic regret bounds matching lower bounds under delays

03

First BoBW algorithm to match lower bounds in both regimes with delays

Abstract

We study the multi-armed bandit problem with adversarially chosen delays in the Best-of-Both-Worlds (BoBW) framework, which aims to achieve near-optimal performance in both stochastic and adversarial environments. While prior work has made progress toward this goal, existing algorithms suffer from significant gaps to the known lower bounds, especially in the stochastic settings. Our main contribution is a new algorithm that, up to logarithmic factors, matches the known lower bounds in each setting individually. In the adversarial case, our algorithm achieves regret of $O (K T + D)$ , which is optimal up to logarithmic terms, where $T$ is the number of rounds, $K$ is the number of arms, and $D$ is the cumulative delay. In the stochastic case, we provide a regret bound which scale as $\sum_{i:\Delta_i>0}\left(\log T/\Delta_i\right) + \frac{1}{K}\sum \Delta_i…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Distributed Sensor Networks and Detection Algorithms