Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
Dylan J. Foster, Alexander Rakhlin

TL;DR
This paper introduces a universal, optimal reduction from contextual bandits to online regression, enabling efficient algorithms with theoretical guarantees for a wide range of function classes, including nonparametric and adversarial contexts.
Contribution
It provides the first universal, minimax optimal reduction from contextual bandits to online regression with no overhead, applicable to general, possibly nonparametric, and adversarial settings.
Findings
Achieves minimax optimal rates for contextual bandits with general function classes.
Requires no distributional assumptions beyond realizability.
Works efficiently even with adversarially chosen contexts.
Abstract
A fundamental challenge in contextual bandits is to develop flexible, general-purpose algorithms with computational requirements no worse than classical supervised learning tasks such as classification and regression. Algorithms based on regression have shown promising empirical success, but theoretical guarantees have remained elusive except in special cases. We provide the first universal and optimal reduction from contextual bandits to online regression. We show how to transform any oracle for online regression with a given value function class into an algorithm for contextual bandits with the induced policy class, with no overhead in runtime or memory requirements. We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
