TL;DR
This paper introduces two algorithms for online nonsubmodular optimization with delayed feedback in bandit settings, achieving improved regret bounds that are less sensitive to delays and decouple delay effects from feedback, with empirical validation.
Contribution
The paper proposes novel algorithms that improve regret bounds in delayed feedback bandit optimization, addressing limitations of previous methods and decoupling delay effects.
Findings
Achieved a regret bound of $ ext{O}(nar{d}^{1/3}T^{2/3})$ with the first algorithm.
Extended the method with a blocking update to get $ ext{O}(n(T^{2/3} + oot{2}{dT}))$ regret bound.
Demonstrated the effectiveness of the algorithms through experiments on structured sparse learning.
Abstract
We investigate the online nonsubmodular optimization with delayed feedback in the bandit setting, where the loss function is -weakly DR-submodular and -weakly DR-supermodular. Previous work has established an -regret bound of , where is the dimensionality and is the maximum delay. However, its regret bound relies on the maximum delay and is thus sensitive to irregular delays. Additionally, it couples the effects of delays and bandit feedback as its bound is the product of the delay term and the regret bound in the bandit setting without delayed feedback. In this paper, we develop two algorithms to address these limitations, respectively. Firstly, we propose a novel method, namely DBGD-NF, which employs the one-point gradient estimator and utilizes all the available estimated gradients in each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
