A Simple Approach for Non-stationary Linear Bandits
Peng Zhao, Lijun Zhang, Yuan Jiang, Zhi-Hua Zhou

TL;DR
This paper corrects previous theoretical results on non-stationary linear bandits, proposes a simple restarted UCB algorithm with optimal regret bounds, and demonstrates its effectiveness through empirical validation.
Contribution
It identifies a flaw in existing analyses, provides a corrected regret bound, and introduces a simple restarted strategy that achieves optimal regret without complex mechanisms.
Findings
Corrected the regret analysis for non-stationary linear bandits.
Proposed a simple restarted UCB algorithm achieving $ ilde{O}(T^{3/4}P_T^{1/4})$ regret.
Empirical results confirm the effectiveness of the proposed approach.
Abstract
This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Existing studies develop various algorithms and show that they enjoy an dynamic regret, where is the time horizon and is the path-length that measures the fluctuation of the evolving unknown parameter. In this paper, we discover that a serious technical flaw makes their results ungrounded, and then present a fix, which gives an dynamic regret without modifying original algorithms. Furthermore, we demonstrate that instead of using sophisticated mechanisms, such as sliding window or weighted penalty, a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
