Efficient Contextual Bandits in Non-stationary Worlds
Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

TL;DR
This paper introduces efficient algorithms for contextual bandits in non-stationary environments, achieving near-optimal regret bounds and adapting dynamically to distribution changes, with the first fully adversarial dynamic regret guarantees.
Contribution
Develops new efficient contextual bandit algorithms with statistical tests for non-stationary settings, providing the first dynamic regret guarantees for fully adversarial environments.
Findings
Achieves $ ilde{O}( oot{S}T)$ regret in stationary periods
Provides a parameter-free algorithm with regret bounds depending on non-stationarity measures
Improves upon previous bounds for non-stationary bandit problems
Abstract
Most contextual bandit algorithms minimize regret against the best fixed policy, a questionable benchmark for non-stationary environments that are ubiquitous in applications. In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i.i.d. problems with sophisticated statistical tests so as to dynamically adapt to a change in distribution. We analyze various standard notions of regret suited to non-stationary environments for these algorithms, including interval regret, switching regret, and dynamic regret. When competing with the best policy at each time, one of our algorithms achieves regret if there are rounds with stationary periods, or more generally where is some non-stationarity measure. These results almost match the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
