Efficient Contextual Bandits in Non-stationary Worlds

Haipeng Luo; Chen-Yu Wei; Alekh Agarwal; John Langford

arXiv:1708.01799·cs.LG·April 5, 2019·77 cites

Efficient Contextual Bandits in Non-stationary Worlds

Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

PDF

Open Access

TL;DR

This paper introduces efficient algorithms for contextual bandits in non-stationary environments, achieving near-optimal regret bounds and adapting dynamically to distribution changes, with the first fully adversarial dynamic regret guarantees.

Contribution

Develops new efficient contextual bandit algorithms with statistical tests for non-stationary settings, providing the first dynamic regret guarantees for fully adversarial environments.

Findings

01

Achieves $ ilde{O}( oot{S}T)$ regret in stationary periods

02

Provides a parameter-free algorithm with regret bounds depending on non-stationarity measures

03

Improves upon previous bounds for non-stationary bandit problems

Abstract

Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret $O (S T)$ if there are $T$ rounds with $S$ stationary periods, or more generally $O (Δ^{1/3} T^{2/3})$ where $Δ$ is some non-stationarity measure. These results almost match the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics