Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks
Naresh Manwani, Mudit Agarwal

TL;DR
Delaytron is an online algorithm designed for multiclass classification with delayed bandit feedbacks, achieving regret bounds that adapt to unknown delays and missing feedback, validated through experiments.
Contribution
We introduce Delaytron, an efficient online algorithm for multiclass classification with unknown delays and missing feedback, providing adaptive regret bounds and empirical validation.
Findings
Achieves regret bounds of order with known delays
Uses doubling trick for unknown delays and missing feedback
Demonstrates effectiveness through experiments on various datasets
Abstract
In this paper, we present online algorithm called {\it Delaytron} for learning multi class classifiers using delayed bandit feedbacks. The sequence of feedback delays is unknown to the algorithm. At the -th round, the algorithm observes an example and predicts a label and receives the bandit feedback only rounds later. When , we consider that the feedback for the -th round is missing. We show that the proposed algorithm achieves regret of when the loss for each missing sample is upper bounded by . In the case when the loss for missing samples is not upper bounded, the regret achieved by Delaytron is $\mathcal{O}\left(\sqrt{\frac{2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms
