On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

Lai Wei; Vaibhav Srivastava

arXiv:1802.08380·stat.ML·April 25, 2018·6 cites

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

Lai Wei, Vaibhav Srivastava

PDF

Open Access

TL;DR

This paper introduces two algorithms for non-stationary multiarmed bandit problems, demonstrating they achieve sublinear regret in both abruptly-changing and slowly-varying environments, with theoretical analysis and numerical validation.

Contribution

The paper proposes LM-DSEE and SW-UCB algorithms tailored for non-stationary environments, providing rigorous regret bounds and performance analysis.

Findings

01

Expected cumulative regret is sublinear in both environments.

02

Algorithms adapt effectively to abrupt and gradual changes.

03

Numerical results support theoretical guarantees.

Abstract

We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms under either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics