Weighted Linear Bandits for Non-Stationary Environments
Yoan Russac (DI-ENS, VALDA), Claire Vernade, Olivier Capp\'e (DI-ENS,, VALDA)

TL;DR
This paper introduces D-LinUCB, an algorithm for non-stationary linear bandit problems that adapts to changing environments using discounted regression, providing optimal regret bounds and demonstrating strong empirical performance.
Contribution
The paper proposes D-LinUCB, a novel optimistic algorithm with discounted linear regression for non-stationary environments, and establishes its theoretical optimal regret bounds.
Findings
D-LinUCB achieves an order d^{2/3} B_T^{1/3} T^{2/3} regret bound.
The algorithm performs well in both slowly-varying and abruptly-changing environments.
Novel deviation results for weighted least-squares estimators are derived.
Abstract
We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d^{2/3} B\_T^{1/3}T^{2/3}, where B\_T is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Data Stream Mining Techniques
MethodsLinear Regression
