Fast Rates for Bandit PAC Multiclass Classification
Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran

TL;DR
This paper introduces a new algorithm for multiclass PAC learning with bandit feedback that achieves faster sample complexity rates, matching the full-information case and resolving open questions about the cost of bandit feedback.
Contribution
The paper presents a novel learning algorithm with improved sample complexity bounds for agnostic multiclass PAC learning under bandit feedback, extending to general hypothesis classes and establishing optimal rates.
Findings
Achieves sample complexity of $O(( ext{poly}(K) + 1/ ext{ε}^2) imes ext{log}(|H|/ ext{δ}))$
Matches the optimal rate in the full-information setting for general classes
Shows the bandit feedback cost is only $O(1)$ in the agnostic case as ε approaches zero.
Abstract
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic -PAC version of the problem, with sample complexity of for any finite hypothesis class . In terms of the leading dependence on , this improves upon existing bounds for the problem, that are of the form . We also provide an extension of this result to general classes and establish similar sample complexity bounds in which is replaced by the Natarajan dimension. This matches the optimal rate in the full-information version of the problem and resolves an open question studied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms
