On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Aur\'elien Garivier (LTCI); Eric Moulines (LTCI)

arXiv:0805.3415·math.ST·December 18, 2008·183 cites

On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

Aur\'elien Garivier (LTCI), Eric Moulines (LTCI)

PDF

Open Access 1 Repo

TL;DR

This paper analyzes non-stationary multi-armed bandit algorithms, specifically discounted and sliding-window UCB, providing regret bounds and demonstrating near-optimal performance despite abrupt environment changes.

Contribution

It introduces regret bounds for discounted and sliding-window UCB algorithms in non-stationary settings with abrupt changes, matching lower bounds up to a logarithmic factor.

Findings

01

Both algorithms achieve near-optimal regret bounds.

02

A Hoeffding-type inequality for self-normalized deviations is derived.

03

Regret lower bounds are established for abrupt changes.

Abstract

Multi-armed bandit problems are considered as a paradigm of the trade-off between exploring the environment to find profitable actions and exploiting what is already known. In the stationary case, the distributions of the rewards do not change in time, Upper-Confidence Bound (UCB) policies have been shown to be rate optimal. A challenging variant of the MABP is the non-stationary bandit problem where the gambler must decide which arm to play while facing the possibility of a changing environment. In this paper, we consider the situation where the distributions of rewards remain constant over epochs and change at unknown time instants. We analyze two algorithms: the discounted UCB and the sliding-window UCB. We establish for these two algorithms an upper-bound for the expected regret by upper-bounding the expectation of the number of times a suboptimal arm is played. For that purpose, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jehankairasvakharia/Santa_2020_Jehan
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics