Bandit and Delayed Feedback in Online Structured Prediction
Yuki Shibukawa, Taira Tsuchiya, Shinsaku Sakaue, Kenji Yamanishi

TL;DR
This paper develops algorithms for online structured prediction under bandit and delayed feedback, providing regret bounds that improve over existing methods and are applicable to complex output spaces.
Contribution
It introduces new algorithms with regret bounds for bandit feedback that are independent of output complexity and extends analysis to delayed feedback scenarios.
Findings
Achieves $O( oot{T}{2})$ regret bound with bandit feedback independent of output size
Provides algorithms with regret bounds for delayed feedback in various settings
Numerical experiments compare performance of proposed and existing algorithms
Abstract
Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full-information setting, we can achieve finite bounds on the \textit{surrogate regret}, \textit{i.e.,}~the extra target loss relative to the best possible surrogate loss. In practice, however, full-information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, \textit{bandit} and \textit{delayed} feedback. For bandit feedback, by using a standard inverse-weighted gradient estimator, we achieve a surrogate regret bound of for the time horizon and the size of the output set . However, can be extremely large when outputs are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
