Non-Stationary Lipschitz Bandits

Nicolas Nguyen; Solenne Gaucher; Claire Vernade

arXiv:2505.18871·stat.ML·October 23, 2025

Non-Stationary Lipschitz Bandits

Nicolas Nguyen, Solenne Gaucher, Claire Vernade

PDF

Open Access

TL;DR

This paper introduces an adaptive algorithm for non-stationary Lipschitz bandits with infinite actions, achieving optimal dynamic regret bounds without prior knowledge of reward shifts.

Contribution

It presents the first minimax-optimal algorithm for non-stationary Lipschitz bandits that detects significant reward shifts adaptively.

Findings

01

Achieves minimax-optimal dynamic regret of O( ilde{L}^{1/3}T^{2/3})

02

Leverages hierarchical discretization to detect reward changes

03

No prior knowledge of non-stationarity required

Abstract

We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $O (\tilde{L}^{1/3} T^{2/3})$ , where $\tilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Decision-Making and Behavioral Economics