Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting

Sifan Yang; Yuanyu Wan; and Lijun Zhang

arXiv:2508.00523·cs.LG·August 4, 2025

Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting

Sifan Yang, Yuanyu Wan, and Lijun Zhang

PDF

1 Video

TL;DR

This paper introduces two algorithms for online nonsubmodular optimization with delayed feedback in bandit settings, achieving improved regret bounds that are less sensitive to delays and decouple delay effects from feedback, with empirical validation.

Contribution

The paper proposes novel algorithms that improve regret bounds in delayed feedback bandit optimization, addressing limitations of previous methods and decoupling delay effects.

Findings

01

Achieved a regret bound of $ ext{O}(nar{d}^{1/3}T^{2/3})$ with the first algorithm.

02

Extended the method with a blocking update to get $ ext{O}(n(T^{2/3} + oot{2}{dT}))$ regret bound.

03

Demonstrated the effectiveness of the algorithms through experiments on structured sparse learning.

Abstract

We investigate the online nonsubmodular optimization with delayed feedback in the bandit setting, where the loss function is $α$ -weakly DR-submodular and $β$ -weakly DR-supermodular. Previous work has established an $(α, β)$ -regret bound of $O (n d^{1/3} T^{2/3})$ , where $n$ is the dimensionality and $d$ is the maximum delay. However, its regret bound relies on the maximum delay and is thus sensitive to irregular delays. Additionally, it couples the effects of delays and bandit feedback as its bound is the product of the delay term and the $O (n T^{2/3})$ regret bound in the bandit setting without delayed feedback. In this paper, we develop two algorithms to address these limitations, respectively. Firstly, we propose a novel method, namely DBGD-NF, which employs the one-point gradient estimator and utilizes all the available estimated gradients in each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting· underline