A New Algorithm for Non-stationary Contextual Bandits: Efficient,   Optimal, and Parameter-free

Yifang Chen; Chung-Wei Lee; Haipeng Luo; Chen-Yu Wei

arXiv:1902.00980·cs.LG·June 19, 2019·39 cites

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

Yifang Chen, Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei

PDF

Open Access

TL;DR

This paper introduces a parameter-free, efficient, and optimal algorithm for non-stationary contextual bandits that adapts to unknown data shifts and achieves improved dynamic regret bounds.

Contribution

The authors develop the first adaptive, parameter-free algorithm for non-stationary contextual bandits with improved regret bounds and practical implementation via an ERM oracle.

Findings

01

Achieves dynamic regret of O(min{√ST, Δ^{1/3} T^{2/3}}).

02

Introduces replay phases to detect non-stationarity.

03

Outperforms previous bounds in related work.

Abstract

We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret. Specifically, our algorithm achieves dynamic regret $O (min {S T, Δ^{\frac{1}{3}} T^{\frac{2}{3}}})$ for a contextual bandit problem with $T$ rounds, $S$ switches and $Δ$ total variation in data distributions. Importantly, our algorithm is adaptive and does not need to know $S$ or $Δ$ ahead of time, and can be implemented efficiently assuming access to an ERM oracle. Our results strictly improve the $O (min {S^{\frac{1}{4}} T^{\frac{3}{4}}, Δ^{\frac{1}{5}} T^{\frac{4}{5}}})$ bound of (Luo et al., 2018), and greatly generalize and improve the $O (S T)$ result of (Auer et al, 2018) that holds only for the two-armed bandit problem without contextual information. The key novelty of our algorithm is to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Data Stream Mining Techniques