Linear Bandits with Memory: from Rotting to Rising
Giulia Clerici, Pierre Laforgue, Nicol\`o Cesa-Bianchi

TL;DR
This paper introduces a nonstationary linear bandit model incorporating memory effects, analyzing regret bounds and proposing algorithms for both known and unknown parameters, with empirical validation.
Contribution
It develops a novel nonstationary linear bandit model with memory, providing regret analysis and algorithms for unknown parameters, extending the applicability of bandit methods to dynamic environments.
Findings
Regret bound of order $ ilde{O}( ext{poly}(d,m,eta,T))$ for the proposed algorithm.
Algorithm performs well in experiments against natural baselines.
Model captures both rotting and rising phenomena in nonstationary bandit settings.
Abstract
Nonstationary phenomena, such as satiation effects in recommendations, have mostly been modeled using bandits with finitely many arms. However, the richer action space provided by linear bandits is often preferred in practice. In this work, we introduce a novel nonstationary linear bandit model, where current rewards are influenced by the learner's past actions in a fixed-size window. Our model, which recovers stationary linear bandits as a special case, leverages two parameters: the window size , and an exponent that captures the rotting ( or rising () nature of the phenomenon. When both and are known, we propose and analyze a variant of OFUL which minimizes regret against cycling policies. By choosing the cycle length so as to trade-off approximation and estimation errors, we then prove a bound of order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Smart Grid Energy Management
