Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity
Wanteng Ma, T. Tony Cai

TL;DR
This paper introduces a single-index model approach for nonparametric contextual bandits, achieving optimal regret bounds independent of high-dimensional covariates, and explores adaptivity limitations and solutions.
Contribution
It develops a nonasymptotic theory for single-index regression, proposes an optimal bandit algorithm, and analyzes its adaptivity and phase transition properties in high dimensions.
Findings
Achieves minimax-optimal regret independent of ambient dimension d.
Establishes a lower bound matching the algorithm's regret, confirming optimality.
Demonstrates a phase transition in regret behavior as dimension increases.
Abstract
Contextual bandits are a central framework for sequential decision-making, with applications ranging from recommendation systems to clinical trials. While nonparametric methods can flexibly model complex reward structures, they suffer from the curse of dimensionality. We address this challenge using a single-index model, which projects high-dimensional covariates onto a one-dimensional subspace while preserving nonparametric flexibility. We first develop a nonasymptotic theory for offline single-index regression for each arm, combining maximum rank correlation for index estimation with local polynomial regression. Building on this foundation, we propose a single-index bandit algorithm and establish its convergence rate. We further derive a matching lower bound, showing that the algorithm achieves minimax-optimal regret independent of the ambient dimension , thereby overcoming the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Causal Inference Techniques · Stochastic Gradient Optimization Techniques
