Nearly Optimal Adaptive Procedure with Change Detection for   Piecewise-Stationary Bandit

Yang Cao; Zheng Wen; Branislav Kveton; and Yao Xie

arXiv:1802.03692·stat.ML·January 25, 2019·48 cites

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

Yang Cao, Zheng Wen, Branislav Kveton, and Yao Xie

PDF

Open Access

TL;DR

This paper introduces M-UCB, a change-detection integrated algorithm for piecewise-stationary bandits, achieving near-optimal regret bounds and demonstrating superior empirical performance over existing methods.

Contribution

The paper proposes a simple yet effective change-detection method integrated with UCB, achieving nearly optimal regret bounds for non-stationary bandit problems.

Findings

01

M-UCB achieves regret of order $O( oot{MKT}\log T)$, nearly matching the lower bound.

02

M-UCB outperforms state-of-the-art algorithms in numerical experiments.

03

The method effectively detects and adapts to changes in reward distributions.

Abstract

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where the reward distributions may change in a piecewise-stationary fashion at unknown time steps. We show that by incorporating a simple change-detection component with classic UCB algorithms to detect and adapt to changes, our so-called M-UCB algorithm can achieve nearly optimal regret bound on the order of $O (M K T lo g T)$ , where $T$ is the number of time steps, $K$ is the number of arms, and $M$ is the number of stationary segments. Comparison with the best available lower bound shows that our M-UCB is nearly optimal in $T$ up to a logarithmic factor. We also compare M-UCB with the state-of-the-art algorithms in numerical experiments using a public…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms