Batched Stochastic Bandit for Nondegenerate Functions
Yu Liu, Yunlu Shu, Tianyu Wang

TL;DR
This paper introduces the Geometric Narrowing (GN) algorithm for batched stochastic bandit problems with nondegenerate functions, achieving near-optimal regret with very few batches, and provides matching lower bounds.
Contribution
The paper proposes the GN algorithm that attains near-optimal regret using only logarithmic double log batches, and establishes lower bounds showing the algorithm's near-optimality.
Findings
GN achieves regret of order rac{A_+^d}{\u00f8} \, \, ext{and} \, \, rac{A_-^d}{\u00f8} \, \, ext{for upper and lower bounds}
GN requires only rac{\, \, ext{log log T}}{ ext{batches}} to perform near-optimally
Lower bounds demonstrate the minimal number of batches needed for any policy to achieve low regret
Abstract
This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order . In addition, GN only needs batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension : 1. For any policy , there exists a problem instance on which admits a regret of order ; 2. No policy can achieve a regret of order over all problem instances, using less than rounds of communications. Our lower bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
