Batched Nonparametric Bandits via k-Nearest Neighbor UCB

Sakshi Arya

arXiv:2505.10498·stat.ML·August 4, 2025

Batched Nonparametric Bandits via k-Nearest Neighbor UCB

Sakshi Arya

PDF

Open Access

TL;DR

This paper introduces BaNk-UCB, a nonparametric batched contextual bandit algorithm that uses k-NN regression and UCB to adaptively balance exploration and exploitation, achieving near-optimal regret guarantees.

Contribution

It proposes a fully nonparametric, adaptive algorithm for batched contextual bandits that improves over prior parametric and binning-based methods.

Findings

01

BaNk-UCB outperforms binning-based baselines in experiments.

02

It achieves near-optimal regret bounds under standard assumptions.

03

The method adapts to context dimension and is simple to implement.

Abstract

We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing -- where online feedback is limited -- we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement. Unlike prior work relying on parametric or binning-based estimators, BaNk-UCB uses local geometry to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research