A Contextual Bandit Bake-off
Alberto Bietti, Alekh Agarwal, John Langford

TL;DR
This paper empirically evaluates various contextual bandit algorithms using large supervised datasets, finding that optimism-based methods perform best overall, with simple greedy approaches and robust variants also showing strong results.
Contribution
It provides a comprehensive empirical comparison of contextual bandit algorithms, highlighting practical performance and robustness of recent methods and components.
Findings
Optimism under uncertainty method performs best overall.
Simple greedy baseline is a close second in performance.
Robust variants like Online Cover are effective and conservative.
Abstract
Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to empirically evaluate contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms
