Learning Contextual Bandits in a Non-stationary Environment

Qingyun Wu; Naveen Iyer; Hongning Wang

arXiv:1805.09365·cs.LG·May 25, 2018

Learning Contextual Bandits in a Non-stationary Environment

Qingyun Wu, Naveen Iyer, Hongning Wang

PDF

1 Repo

TL;DR

This paper introduces a contextual bandit algorithm designed for non-stationary environments, capable of detecting changes and adapting its strategy, with proven regret bounds and validated effectiveness on synthetic and real-world datasets.

Contribution

It presents a novel algorithm that detects environment changes and adapts in non-stationary settings, with theoretical regret analysis and empirical validation.

Findings

01

The algorithm effectively detects environment shifts.

02

It achieves sublinear regret in changing environments.

03

Empirical results show improved recommendation performance.

Abstract

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YRussac/WeightedLinearBandits
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.