A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with   Robustness to Excessive Delays

Saeed Masoudian; Julian Zimmert; Yevgeny Seldin

arXiv:2308.10675·cs.LG·May 29, 2024

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

PDF

Open Access 1 Video

TL;DR

This paper introduces a robust best-of-both-worlds algorithm for bandits with highly variable and excessive delays, eliminating the need for prior delay knowledge and improving regret bounds.

Contribution

It presents the first implicit exploration scheme, delay-agnostic control of distribution drift, and a novel regret relation, advancing bandit algorithms for delayed feedback.

Findings

01

Handles arbitrary delays up to the time horizon T

02

Introduces implicit exploration in best-of-both-worlds setting

03

Relates regret to information missing rather than delay length

Abstract

We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. In contrast to prior work, which required prior knowledge of the maximal delay $d_{max}$ and had a linear dependence of the regret on it, our algorithm can tolerate arbitrary excessive delays up to order $T$ (where $T$ is the time horizon). The algorithm is based on three technical innovations, which may all be of independent interest: (1) We introduce the first implicit exploration scheme that works in best-of-both-worlds setting. (2) We introduce the first control of distribution drift that does not rely on boundedness of delays. The control is based on the implicit exploration scheme and adaptive skipping of observations with excessive delays. (3) We introduce a procedure relating standard regret with drifted regret that does not rely on boundedness of delays. At the conceptual level,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques