Batched Stochastic Bandit for Nondegenerate Functions

Yu Liu; Yunlu Shu; Tianyu Wang

arXiv:2405.05733·stat.ML·April 9, 2025

Batched Stochastic Bandit for Nondegenerate Functions

Yu Liu, Yunlu Shu, Tianyu Wang

PDF

Open Access

TL;DR

This paper introduces the Geometric Narrowing (GN) algorithm for batched stochastic bandit problems with nondegenerate functions, achieving near-optimal regret with very few batches, and provides matching lower bounds.

Contribution

The paper proposes the GN algorithm that attains near-optimal regret using only logarithmic double log batches, and establishes lower bounds showing the algorithm's near-optimality.

Findings

01

GN achieves regret of order rac{A_+^d}{\u00f8} \, \, ext{and} \, \, rac{A_-^d}{\u00f8} \, \, ext{for upper and lower bounds}

02

GN requires only rac{\, \, ext{log log T}}{ ext{batches}} to perform near-optimally

03

Lower bounds demonstrate the minimal number of batches needed for any policy to achieve low regret

Abstract

This paper studies batched bandit learning problems for nondegenerate functions. We introduce an algorithm that solves the batched bandit problem for nondegenerate functions near-optimally. More specifically, we introduce an algorithm, called Geometric Narrowing (GN), whose regret bound is of order $O (A_{+}^{d} T)$ . In addition, GN only needs $O (lo g lo g T)$ batches to achieve this regret. We also provide lower bound analysis for this problem. More specifically, we prove that over some (compact) doubling metric space of doubling dimension $d$ : 1. For any policy $π$ , there exists a problem instance on which $π$ admits a regret of order $Ω (A_{-}^{d} T)$ ; 2. No policy can achieve a regret of order $A_{-}^{d} T$ over all problem instances, using less than $Ω (lo g lo g T)$ rounds of communications. Our lower bound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques