Efficient Online Bandit Multiclass Learning with $\tilde{O}(\sqrt{T})$ Regret
Alina Beygelzimer, Francesco Orabona, Chicheng Zhang

TL;DR
This paper introduces an efficient second-order algorithm for online multiclass bandit learning that achieves near-optimal regret bounds and performs well in experiments, addressing a longstanding open problem.
Contribution
The paper proposes a novel second-order algorithm with $ ilde{O}(rac{1}{ ext{eta}}\sqrt{T})$ regret for bandit multiclass problems, covering a range of loss functions.
Findings
Achieves $ ilde{O}(rac{1}{ ext{eta}}\sqrt{T})$ regret bound.
Performs favorably against previous algorithms in experiments.
Addresses an open problem in bandit multiclass learning.
Abstract
We present an efficient second-order algorithm with regret for the bandit online multiclass problem. The regret bound holds simultaneously with respect to a family of loss functions parameterized by , for a range of restricted by the norm of the competitor. The family of loss functions ranges from hinge loss () to squared hinge loss (). This provides a solution to the open problem of (J. Abernethy and A. Rakhlin. An efficient bandit algorithm for -regret in online multiclass prediction? In COLT, 2009). We test our algorithm experimentally, showing that it also performs favorably against earlier algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
