Improved Regret for Bandit Convex Optimization with Delayed Feedback
Yuanyu Wan, Chang Yao, Mingli Song, Lijun Zhang

TL;DR
This paper introduces a new algorithm for bandit convex optimization with delayed feedback, achieving tighter regret bounds that match lower bounds in worst-case scenarios, especially when delays are large.
Contribution
The authors develop a novel algorithm that improves regret bounds for delayed feedback in bandit convex optimization, achieving tight bounds in worst-case delay scenarios.
Findings
Regret bound of O(√n T^{3/4} + √d T) for general convex functions.
Enhanced regret bound of O((nT)^{2/3} log^{1/3} T + d log T) for strongly convex functions.
Extension to unconstrained action sets with regret O(n√T log T + d log T).
Abstract
We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an regret bound for this problem, whose delay-independent part matches the regret of the classical non-delayed bandit gradient descent algorithm. However, there is a large gap between its delay-dependent part, i.e., , and an existing lower bound. In this paper, we illustrate that this gap can be filled in the worst case, where is very close to the maximum delay . Specifically, we first develop a novel algorithm, and prove that it enjoys a regret bound of in general. Compared with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and ELM · Smart Grid Energy Management
