Kernel Single-Index Bandits: Estimation, Inference, and Learning

Sakshi Arya; Satarupa Bhattacharjee; Bharath K. Sriperumbudur

arXiv:2603.18938·stat.ML·March 20, 2026

Kernel Single-Index Bandits: Estimation, Inference, and Learning

Sakshi Arya, Satarupa Bhattacharjee, Bharath K. Sriperumbudur

PDF

Open Access

TL;DR

This paper introduces a kernelized $ ext{ε}$-greedy algorithm for single-index contextual bandits, enabling flexible semiparametric learning, valid inference, and finite-time regret guarantees in adaptive, dependent data settings.

Contribution

It develops a novel kernelized $ ext{ε}$-greedy method with asymptotic inference tools for single-index bandits, addressing adaptive sampling and dependent observations.

Findings

01

Asymptotic normality for index estimators under adaptive sampling.

02

Valid confidence intervals for reward functions via a functional CLT.

03

Finite-time regret bounds of $ ilde{O}( oot T)$ under Lipschitz conditions.

Abstract

We study contextual bandits with finitely many actions in which the reward of each arm follows a single-index model with an arm-specific index parameter and an unknown nonparametric link function. We consider a regime in which arms correspond to stable decision options and covariates evolve adaptively under the bandit policy. This setting creates significant statistical challenges: the sampling distribution depends on the allocation rule, observations are dependent over time, and inverse-propensity weighting induces variance inflation. We propose a kernelized $ε$ -greedy algorithm that combines Stein-based estimation of the index parameters with inverse-propensity-weighted kernel ridge regression for the reward functions. This approach enables flexible semiparametric learning while retaining interpretability. Our analysis develops new tools for inference with adaptively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Advanced Causal Inference Techniques