A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays
Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

TL;DR
This paper introduces a robust best-of-both-worlds algorithm for bandits with highly variable and excessive delays, eliminating the need for prior delay knowledge and improving regret bounds.
Contribution
It presents the first implicit exploration scheme, delay-agnostic control of distribution drift, and a novel regret relation, advancing bandit algorithms for delayed feedback.
Findings
Handles arbitrary delays up to the time horizon T
Introduces implicit exploration in best-of-both-worlds setting
Relates regret to information missing rather than delay length
Abstract
We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. In contrast to prior work, which required prior knowledge of the maximal delay and had a linear dependence of the regret on it, our algorithm can tolerate arbitrary excessive delays up to order (where is the time horizon). The algorithm is based on three technical innovations, which may all be of independent interest: (1) We introduce the first implicit exploration scheme that works in best-of-both-worlds setting. (2) We introduce the first control of distribution drift that does not rely on boundedness of delays. The control is based on the implicit exploration scheme and adaptive skipping of observations with excessive delays. (3) We introduce a procedure relating standard regret with drifted regret that does not rely on boundedness of delays. At the conceptual level,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques
