Online Multiclass Boosting with Bandit Feedback
Daniel T. Zhang, Young Hun Jung, Ambuj Tewari

TL;DR
This paper introduces online multiclass boosting algorithms that operate under bandit feedback, enabling effective learning with limited feedback by estimating losses unbiasedly and extending full information algorithms to this setting.
Contribution
It develops unbiased loss estimation methods and extends existing boosting algorithms to the bandit feedback scenario, matching their error bounds.
Findings
Error bounds match full information algorithms
Sample complexity increases with limited feedback
Performance is comparable to existing bandit boosting methods
Abstract
We present online boosting algorithms for multiclass classification with bandit feedback, where the learner only receives feedback about the correctness of its prediction. We propose an unbiased estimate of the loss using a randomized prediction, allowing the model to update its weak learners with limited information. Using the unbiased estimate, we extend two full information boosting algorithms (Jung et al., 2017) to the bandit setting. We prove that the asymptotic error bounds of the bandit algorithms exactly match their full information counterparts. The cost of restricted feedback is reflected in the larger sample complexity. Experimental results also support our theoretical findings, and performance of the proposed models is comparable to that of an existing bandit boosting algorithm, which is limited to use binary weak learners.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
