On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems
Lai Wei, Vaibhav Srivastava

TL;DR
This paper introduces two algorithms for non-stationary multiarmed bandit problems, demonstrating they achieve sublinear regret in both abruptly-changing and slowly-varying environments, with theoretical analysis and numerical validation.
Contribution
The paper proposes LM-DSEE and SW-UCB algorithms tailored for non-stationary environments, providing rigorous regret bounds and performance analysis.
Findings
Expected cumulative regret is sublinear in both environments.
Algorithms adapt effectively to abrupt and gradual changes.
Numerical results support theoretical guarantees.
Abstract
We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms under either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
