Smooth Non-Stationary Bandits

Su Jia; Qian Xie; Nathan Kallus; Peter I. Frazier

arXiv:2301.12366·cs.LG·November 19, 2024·1 cites

Smooth Non-Stationary Bandits

Su Jia, Qian Xie, Nathan Kallus, Peter I. Frazier

PDF

Open Access 1 Video

TL;DR

This paper introduces a new bandit algorithm tailored for smoothly changing environments, achieving lower regret than previous methods, especially when changes are highly smooth, and validates it with real-world data.

Contribution

It presents the first separation between smooth and non-smooth non-stationary bandit regimes, providing new regret bounds and a practical algorithm for smooth environments.

Findings

01

Achieved $ ilde O(k^{4/5} T^{3/5})$ regret for 2-H"older functions.

02

Established minimax regret lower bounds matching upper bounds for $eta=2$.

03

Validated the approach with real-world click-through rate data.

Abstract

In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time. However, in practice, environments often change {\em smoothly}, so such algorithms may incur higher-than-necessary regret. We study a non-stationary bandits problem where each arm's mean reward sequence can be embedded into a $β$ -H\"older function, i.e., a function that is $(β - 1)$ -times Lipschitz-continuously differentiable. The non-stationarity becomes more smooth as $β$ increases. When $β = 1$ , this corresponds to the non-smooth regime, where \cite{besbes2014stochastic} established a minimax regret of $\tilde{Θ} (T^{2/3})$ . We show the first separation between the smooth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Smooth Non-stationary Bandits· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Auction Theory and Applications