Non-Stationary Off-Policy Optimization
Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow and, Amr Ahmed

TL;DR
This paper introduces a novel method for off-policy optimization in non-stationary environments, combining offline data partitioning and adaptive online policy switching, with theoretical guarantees and empirical validation.
Contribution
It proposes a two-phase approach for off-policy optimization in piecewise-stationary settings, with guarantees on policy quality and regret, advancing the handling of non-stationarity in off-policy learning.
Findings
Outperforms state-of-the-art baselines on synthetic datasets.
Effective in real-world non-stationary environments.
Provides theoretical guarantees on policy quality and regret.
Abstract
Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
