Hedging the Drift: Learning to Optimize under Non-Stationarity

Wang Chi Cheung; David Simchi-Levi; Ruihao Zhu

arXiv:1903.01461·cs.LG·March 19, 2021·35 cites

Hedging the Drift: Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

PDF

Open Access

TL;DR

This paper develops adaptive algorithms for non-stationary bandit problems, achieving near-optimal dynamic regret bounds by combining stochastic and adversarial learning techniques, and demonstrates superior empirical performance.

Contribution

It introduces a general framework and a sliding window-UCB algorithm that adaptively achieves optimal regret bounds without prior knowledge of environment variation.

Findings

01

Achieves state-of-the-art dynamic regret bounds in non-stationary bandit settings.

02

Demonstrates superior empirical performance on synthetic and real-world datasets.

03

Provides a parameter-free, adaptive learning framework for changing environments.

Abstract

We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown \emph{a priori} and possibly adversarial) non-stationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Our main contribution is a general algorithmic recipe for a wide variety of non-stationary bandit problems. Specifically, we design and analyze the sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound for each of the settings when we know the respective underlying \emph{variation budget}, which quantifies the total amount of temporal variation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management