On the Hardness of Bandit Learning
Nataly Brukhim, Aldo Pacchiano, Miroslav Dudik, Robert Schapire

TL;DR
This paper explores the fundamental limits of bandit learning, revealing that classical complexity measures do not characterize learnability and demonstrating computational hardness results for certain reward classes.
Contribution
It shows that traditional combinatorial dimensions do not determine bandit learnability and establishes computational hardness for specific reward function classes.
Findings
Classical complexity measures fail to characterize bandit learnability.
Certain reward classes require exponential time to identify optimal actions.
Efficient algorithms exist for some operations despite hardness of the overall task.
Abstract
We study the task of bandit learning, also known as best-arm identification, under the assumption that the true reward function f belongs to a known, but arbitrary, function class F. We seek a general theory of bandit learnability, akin to the PAC framework for classification. Our investigation is guided by the following two questions: (1) which classes F are learnable, and (2) how they are learnable. For example, in the case of binary PAC classification, learnability is fully determined by a combinatorial dimension - the VC dimension- and can be attained via a simple algorithmic principle, namely, empirical risk minimization (ERM). In contrast to classical learning-theoretic results, our findings reveal limitations of learning in structured bandits, offering insights into the boundaries of bandit learnability. First, for the question of "which", we show that the paradigm of identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
