Batched Nonparametric Bandits via k-Nearest Neighbor UCB
Sakshi Arya

TL;DR
This paper introduces BaNk-UCB, a nonparametric batched contextual bandit algorithm that uses k-NN regression and UCB to adaptively balance exploration and exploitation, achieving near-optimal regret guarantees.
Contribution
It proposes a fully nonparametric, adaptive algorithm for batched contextual bandits that improves over prior parametric and binning-based methods.
Findings
BaNk-UCB outperforms binning-based baselines in experiments.
It achieves near-optimal regret bounds under standard assumptions.
The method adapts to context dimension and is simple to implement.
Abstract
We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing -- where online feedback is limited -- we propose a nonparametric algorithm that combines adaptive k-nearest neighbor (k-NN) regression with the upper confidence bound (UCB) principle. Our method, BaNk-UCB, is fully nonparametric, adapts to the context dimension, and is simple to implement. Unlike prior work relying on parametric or binning-based estimators, BaNk-UCB uses local geometry to estimate rewards and adaptively balances exploration and exploitation. We provide near-optimal regret guarantees under standard Lipschitz smoothness and margin assumptions, using a theoretically motivated batch schedule that balances regret across batches and achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
