A Simple Approach for Non-stationary Linear Bandits

Peng Zhao; Lijun Zhang; Yuan Jiang; Zhi-Hua Zhou

arXiv:2103.05324·cs.LG·December 23, 2021·30 cites

A Simple Approach for Non-stationary Linear Bandits

Peng Zhao, Lijun Zhang, Yuan Jiang, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper corrects previous theoretical results on non-stationary linear bandits, proposes a simple restarted UCB algorithm with optimal regret bounds, and demonstrates its effectiveness through empirical validation.

Contribution

It identifies a flaw in existing analyses, provides a corrected regret bound, and introduces a simple restarted strategy that achieves optimal regret without complex mechanisms.

Findings

01

Corrected the regret analysis for non-stationary linear bandits.

02

Proposed a simple restarted UCB algorithm achieving $ ilde{O}(T^{3/4}P_T^{1/4})$ regret.

03

Empirical results confirm the effectiveness of the proposed approach.

Abstract

This paper investigates the problem of non-stationary linear bandits, where the unknown regression parameter is evolving over time. Existing studies develop various algorithms and show that they enjoy an $O (T^{2/3} P_{T}^{1/3})$ dynamic regret, where $T$ is the time horizon and $P_{T}$ is the path-length that measures the fluctuation of the evolving unknown parameter. In this paper, we discover that a serious technical flaw makes their results ungrounded, and then present a fix, which gives an $O (T^{3/4} P_{T}^{1/4})$ dynamic regret without modifying original algorithms. Furthermore, we demonstrate that instead of using sophisticated mechanisms, such as sliding window or weighted penalty, a simple restarted strategy is sufficient to attain the same regret guarantee. Specifically, we design an UCB-type algorithm to balance exploitation and exploration,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics