Hedging the Drift: Learning to Optimize under Non-Stationarity
Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

TL;DR
This paper develops adaptive algorithms for non-stationary bandit problems, achieving near-optimal dynamic regret bounds by combining stochastic and adversarial learning techniques, and demonstrates superior empirical performance.
Contribution
It introduces a general framework and a sliding window-UCB algorithm that adaptively achieves optimal regret bounds without prior knowledge of environment variation.
Findings
Achieves state-of-the-art dynamic regret bounds in non-stationary bandit settings.
Demonstrates superior empirical performance on synthetic and real-world datasets.
Provides a parameter-free, adaptive learning framework for changing environments.
Abstract
We introduce data-driven decision-making algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown \emph{a priori} and possibly adversarial) non-stationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Our main contribution is a general algorithmic recipe for a wide variety of non-stationary bandit problems. Specifically, we design and analyze the sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound for each of the settings when we know the respective underlying \emph{variation budget}, which quantifies the total amount of temporal variation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
