Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes
Yichun Hu, Nathan Kallus, Xiaojie Mao

TL;DR
This paper introduces a new algorithm for nonparametric contextual bandits that adapts to different levels of reward function smoothness, bridging the gap between existing parametric and non-differentiable regimes, and achieves optimal regret bounds.
Contribution
It develops a novel adaptive algorithm that seamlessly interpolates between parametric and non-differentiable bandit settings, with proven rate-optimal regret bounds across all smoothness levels.
Findings
The algorithm achieves rate-optimal regret in all smoothness regimes.
Matching upper and lower bounds establish the theoretical optimality.
The work unifies and extends existing results on bandit regret regimes.
Abstract
We study a nonparametric contextual bandit problem where the expected reward functions belong to a H\"older class with smoothness parameter . We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits (), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits (satisfying ), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and non-differentiable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
