Taking a hint: How to leverage loss predictors in contextual bandits?
Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

TL;DR
This paper explores how loss predictors can improve regret bounds in contextual bandits, revealing new bounds and algorithms for various settings, including adversarial and stochastic environments.
Contribution
It provides the first comprehensive analysis of leveraging loss predictors in contextual bandits, establishing tight bounds and novel algorithms for different scenarios.
Findings
Optimal regret with known error is O(min{√T, √E T^{1/4}}).
Unknown error case achieves regret O(√E T^{1/3}).
Linear dependence on the number of predictors is necessary.
Abstract
We initiate the study of learning in contextual bandits with the help of loss predictors. The main question we address is whether one can improve over the minimax regret for learning over rounds, when the total error of the predictor is relatively small. We provide a complete answer to this question, including upper and lower bounds for various settings: adversarial versus stochastic environments, known versus unknown , and single versus multiple predictors. We show several surprising results, such as 1) the optimal regret is when is known, a sharp contrast to the standard and better bound for non-contextual problems (such as multi-armed bandits); 2) the same bound cannot be achieved if is unknown,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Consumer Market Behavior and Pricing · Auction Theory and Applications
