On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
Aur\'elien Garivier (LTCI), Eric Moulines (LTCI)

TL;DR
This paper analyzes non-stationary multi-armed bandit algorithms, specifically discounted and sliding-window UCB, providing regret bounds and demonstrating near-optimal performance despite abrupt environment changes.
Contribution
It introduces regret bounds for discounted and sliding-window UCB algorithms in non-stationary settings with abrupt changes, matching lower bounds up to a logarithmic factor.
Findings
Both algorithms achieve near-optimal regret bounds.
A Hoeffding-type inequality for self-normalized deviations is derived.
Regret lower bounds are established for abrupt changes.
Abstract
Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not change in time, Upper-Confidence Bound (UCB) policies have been shown to be rate optimal. A challenging variant of the MABP is the non-stationary bandit problem where the gambler must decide which arm to play while facing the possibility of a changing environment. In this paper, we consider the situation where the distributions of rewards remain constant over epochs and change at unknown time instants. We analyze two algorithms: the discounted UCB and the sliding-window UCB. We establish for these two algorithms an upper-bound for the expected regret by upper-bounding the expectation of the number of times a suboptimal arm is played. For that purpose, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
