Adaptive Smooth Non-Stationary Bandits
Joe Suk

TL;DR
This paper establishes the minimax dynamic regret rates for smooth non-stationary bandit models, develops adaptive algorithms without prior knowledge of smoothness parameters, and explores faster gap-dependent regret rates in environments with safe arms.
Contribution
It provides the first general minimax regret bounds for all smoothness parameters in non-stationary bandits and introduces adaptive algorithms that do not require prior parameter knowledge.
Findings
Established minimax dynamic regret rates for all parameters.
Designed adaptive algorithms achieving these rates without prior knowledge.
Identified conditions under which faster gap-dependent regret rates are possible.
Abstract
We study a -armed non-stationary bandit model where rewards change smoothly, as captured by H\"{o}lder class assumptions on rewards as functions of time. Such smooth changes are parametrized by a H\"{o}lder exponent and coefficient . While various sub-cases of this general model have been studied in isolation, we first establish the minimax dynamic regret rate generally for all . Next, we show this optimal dynamic regret can be attained adaptively, without knowledge of . To contrast, even with parameter knowledge, upper bounds were only previously known for limited regimes and (Slivkins, 2014; Krishnamurthy and Gopalan, 2021; Manegueu et al., 2021; Jia et al.,2023). Thus, our work resolves open questions raised by these disparate threads of the literature. We also study the problem of attaining faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques
