On Characterizing Learnability for Adversarial Noisy Bandits
Steve Hanneke, Kun Wang

TL;DR
This paper characterizes the conditions under which adversarial noisy bandit problems are learnable, using a convexified generalized maximin volume, and explores its implications for different types of adversaries and arm spaces.
Contribution
It introduces a new characterization of learnability in adversarial noisy bandits based on a convexified generalized maximin volume, extending understanding to adaptive adversaries and uncountable arm spaces.
Findings
Characterizes learnability for oblivious adversaries using convexified generalized maximin volume.
Shows the same quantity characterizes learnability for adaptive adversaries with countable arm spaces.
Proposes the distribution covering number as a potential measure for uncountable arm spaces.
Abstract
We study adversarial noisy bandits given a known function class . In each round, the adversary selects a function , the learner chooses an arm, and then observes a noisy reward determined by the chosen arm and the function . The goal is to minimize the cumulative regret , defined as the difference between the learner's performance and that of the best fixed arm in hindsight over rounds. We say that a function class is learnable if there exists an algorithm achieving sublinear regret. Our main results concern characterizing learnability. The main quantity appearing in our characterization is a convexified variant of the generalized maximin volume introduced by Hanneke and Wang (2025). For oblivious adversaries, we characterize learnability in terms of this convexified generalized maximin volume. For adaptive adversaries, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
