Bandit and Delayed Feedback in Online Structured Prediction

Yuki Shibukawa; Taira Tsuchiya; Shinsaku Sakaue; Kenji Yamanishi

arXiv:2502.18709·cs.LG·January 6, 2026

Bandit and Delayed Feedback in Online Structured Prediction

Yuki Shibukawa, Taira Tsuchiya, Shinsaku Sakaue, Kenji Yamanishi

PDF

Open Access 1 Video

TL;DR

This paper develops algorithms for online structured prediction under bandit and delayed feedback, providing regret bounds that improve over existing methods and are applicable to complex output spaces.

Contribution

It introduces new algorithms with regret bounds for bandit feedback that are independent of output complexity and extends analysis to delayed feedback scenarios.

Findings

01

Achieves $O( oot{T}{2})$ regret bound with bandit feedback independent of output size

02

Provides algorithms with regret bounds for delayed feedback in various settings

03

Numerical experiments compare performance of proposed and existing algorithms

Abstract

Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full-information setting, we can achieve finite bounds on the \textit{surrogate regret}, \textit{i.e.,}~the extra target loss relative to the best possible surrogate loss. In practice, however, full-information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, \textit{bandit} and \textit{delayed} feedback. For bandit feedback, by using a standard inverse-weighted gradient estimator, we achieve a surrogate regret bound of $O (K T)$ for the time horizon $T$ and the size of the output set $K$ . However, $K$ can be extremely large when outputs are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bandit and Delayed Feedback in Online Structured Prediction· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques