Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity

Wanteng Ma; T. Tony Cai

arXiv:2512.24669·math.ST·January 1, 2026

Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity

Wanteng Ma, T. Tony Cai

PDF

Open Access

TL;DR

This paper introduces a single-index model approach for nonparametric contextual bandits, achieving optimal regret bounds independent of high-dimensional covariates, and explores adaptivity limitations and solutions.

Contribution

It develops a nonasymptotic theory for single-index regression, proposes an optimal bandit algorithm, and analyzes its adaptivity and phase transition properties in high dimensions.

Findings

01

Achieves minimax-optimal regret independent of ambient dimension d.

02

Establishes a lower bound matching the algorithm's regret, confirming optimality.

03

Demonstrates a phase transition in regret behavior as dimension increases.

Abstract

Contextual bandits are a central framework for sequential decision-making, with applications ranging from recommendation systems to clinical trials. While nonparametric methods can flexibly model complex reward structures, they suffer from the curse of dimensionality. We address this challenge using a single-index model, which projects high-dimensional covariates onto a one-dimensional subspace while preserving nonparametric flexibility. We first develop a nonasymptotic theory for offline single-index regression for each arm, combining maximum rank correlation for index estimation with local polynomial regression. Building on this foundation, we propose a single-index bandit algorithm and establish its convergence rate. We further derive a matching lower bound, showing that the algorithm achieves minimax-optimal regret independent of the ambient dimension $d$ , thereby overcoming the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Stochastic Gradient Optimization Techniques