Learning to Optimize under Non-Stationarity

Wang Chi Cheung; David Simchi-Levi; Ruihao Zhu

arXiv:1810.03024·cs.LG·July 20, 2021

Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

PDF

TL;DR

This paper develops new algorithms for non-stationary linear bandit problems, achieving optimal dynamic regret bounds and effectively handling changing environments in applications like pricing and advertising.

Contribution

It introduces the SW-UCB algorithm with optimal regret bounds and the BOB framework that is tuning-free, advancing the state-of-the-art in non-stationary bandit learning.

Findings

01

SW-UCB achieves $ ilde{O}(d^{2/3}(B_T+1)^{1/3}T^{2/3})$ regret.

02

BOB framework attains $ ilde{O}(d^{2/3}(B_T+1)^{1/4}T^{3/4})$ regret.

03

Algorithms are effective in dynamic pricing and ad allocation scenarios.

Abstract

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining $d, B_{T},$ and $T$ as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal $O (d^{2/3} (B_{T} + 1)^{1/3} T^{2/3})$ dynamic regret, and the tuning free bandit-over-bandit (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best $O (d^{2/3} (B_{T} + 1)^{1/4} T^{3/4})$ dynamic regret.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.