Smooth Contextual Bandits: Bridging the Parametric and   Non-differentiable Regret Regimes

Yichun Hu; Nathan Kallus; Xiaojie Mao

arXiv:1909.02553·stat.ML·September 14, 2020·6 cites

Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

Yichun Hu, Nathan Kallus, Xiaojie Mao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new algorithm for nonparametric contextual bandits that adapts to different levels of reward function smoothness, bridging the gap between existing parametric and non-differentiable regimes, and achieves optimal regret bounds.

Contribution

It develops a novel adaptive algorithm that seamlessly interpolates between parametric and non-differentiable bandit settings, with proven rate-optimal regret bounds across all smoothness levels.

Findings

01

The algorithm achieves rate-optimal regret in all smoothness regimes.

02

Matching upper and lower bounds establish the theoretical optimality.

03

The work unifies and extends existing results on bandit regret regimes.

Abstract

We study a nonparametric contextual bandit problem where the expected reward functions belong to a H\"older class with smoothness parameter $β$ . We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits ( $β \leq 1$ ), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits (satisfying $β = \infty$ ), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and non-differentiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CausalML/SmoothBandit
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management