Improved Regret for Bandit Convex Optimization with Delayed Feedback

Yuanyu Wan; Chang Yao; Mingli Song; Lijun Zhang

arXiv:2402.09152·cs.LG·June 25, 2024·1 cites

Improved Regret for Bandit Convex Optimization with Delayed Feedback

Yuanyu Wan, Chang Yao, Mingli Song, Lijun Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a new algorithm for bandit convex optimization with delayed feedback, achieving tighter regret bounds that match lower bounds in worst-case scenarios, especially when delays are large.

Contribution

The authors develop a novel algorithm that improves regret bounds for delayed feedback in bandit convex optimization, achieving tight bounds in worst-case delay scenarios.

Findings

01

Regret bound of O(√n T^{3/4} + √d T) for general convex functions.

02

Enhanced regret bound of O((nT)^{2/3} log^{1/3} T + d log T) for strongly convex functions.

03

Extension to unconstrained action sets with regret O(n√T log T + d log T).

Abstract

We investigate bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under an arbitrary delay. Let $n, T, \overset{ˉ}{d}$ denote the dimensionality, time horizon, and average delay, respectively. Previous studies have achieved an $O (n T^{3/4} + (n \overset{ˉ}{d})^{1/3} T^{2/3})$ regret bound for this problem, whose delay-independent part matches the regret of the classical non-delayed bandit gradient descent algorithm. However, there is a large gap between its delay-dependent part, i.e., $O ((n \overset{ˉ}{d})^{1/3} T^{2/3})$ , and an existing $Ω (\overset{ˉ}{d} T)$ lower bound. In this paper, we illustrate that this gap can be filled in the worst case, where $\overset{ˉ}{d}$ is very close to the maximum delay $d$ . Specifically, we first develop a novel algorithm, and prove that it enjoys a regret bound of $O (n T^{3/4} + d T)$ in general. Compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improved Regret for Bandit Convex Optimization with Delayed Feedback· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and ELM · Smart Grid Energy Management