Optimal Regret for Single Index Bandits

Devdan Dey; Sujoy Bhore; Avishek Ghosh

arXiv:2605.09454·stat.ML·May 12, 2026

Optimal Regret for Single Index Bandits

Devdan Dey, Sujoy Bhore, Avishek Ghosh

PDF

Abstract

We study the $single-index bandit$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance. While optimal regret guarantees are known for monotone reward functions, the general non-monotone case remains poorly understood, with the best known bound being $\tilde{O} (T^{3/4})$ (under standard boundedness and Lipschitz assumptions on the reward function [Kang et al., 2025]). We close this gap by establishing the optimal regret for general single-index bandits. We propose a simple two-phase algorithm, namely, Zoomed Single Index Bandit with Upper Confidence Bound ( $ZoomSIB-UCB$ ), that first estimates the projection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.