The Adaptivity Barrier in Batched Nonparametric Bandits: Sharp Characterization of the Price of Unknown Margin
Rong Jiang, Cong Ma

TL;DR
This paper characterizes the fundamental cost of adapting to an unknown margin in batched nonparametric bandits, revealing a polynomial regret inflation barrier that diminishes with more frequent batching.
Contribution
It introduces the regret inflation criterion, derives the optimal polynomial regret inflation, and proposes RoBIN, a rate-optimal algorithm that nearly attains this bound.
Findings
Regret inflation grows polynomially with horizon T.
Optimal batch allocation is derived from a convex optimization problem.
The adaptivity barrier disappears with more than log log T batches.
Abstract
We study batched nonparametric contextual bandits under a margin condition when the margin parameter is unknown. To capture the statistical cost of this ignorance, we introduce the regret inflation criterion, defined as the ratio between the regret of an adaptive algorithm and that of an oracle knowing . We show that the optimal regret inflation grows polynomially with the horizon , with exponent given by the value of a convex optimization problem that depends on the dimension, smoothness, and number of batches . Moreover, the minimizer of this optimization problem directly prescribes the batch allocation and exploration strategy of a rate-optimal algorithm. Building on this principle, we develop RoBIN (RObust batched algorithm with adaptive BINning), which achieves the optimal regret inflation up to polylogarithmic factors. These results reveal a new adaptivity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Game Theory and Applications
