Online Learning under Delayed Feedback
Pooria Joulani, Andr\'as Gy\"orgy, Csaba Szepesv\'ari

TL;DR
This paper systematically studies online learning with delayed feedback, revealing how delay impacts regret differently in adversarial and stochastic settings, and proposes algorithms to handle delays efficiently.
Contribution
It provides a comprehensive analysis of delay effects on regret and introduces meta-algorithms and modified UCB algorithms for delayed feedback scenarios.
Findings
Delay increases regret multiplicatively in adversarial settings.
Delay addsitively affects regret in stochastic settings.
Proposed algorithms effectively handle delayed feedback with lower complexity.
Abstract
Online learning with delayed feedback has received increasing attention recently due to its several applications in distributed, web-based learning problems. In this paper we provide a systematic study of the topic, and analyze the effect of delay on the regret of online learning algorithms. Somewhat surprisingly, it turns out that delay increases the regret in a multiplicative way in adversarial problems, and in an additive way in stochastic problems. We give meta-algorithms that transform, in a black-box fashion, algorithms developed for the non-delayed case into ones that can handle the presence of delays in the feedback loop. Modifications of the well-known UCB algorithm are also developed for the bandit problem with delayed feedback, with the advantage over the meta-algorithms that they can be implemented with lower complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
