Non-Stationary Off-Policy Optimization

Joey Hong; Branislav Kveton; Manzil Zaheer; Yinlam Chow and; Amr Ahmed

arXiv:2006.08236·cs.LG·April 6, 2021·1 cites

Non-Stationary Off-Policy Optimization

Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow and, Amr Ahmed

PDF

Open Access

TL;DR

This paper introduces a novel method for off-policy optimization in non-stationary environments, combining offline data partitioning and adaptive online policy switching, with theoretical guarantees and empirical validation.

Contribution

It proposes a two-phase approach for off-policy optimization in piecewise-stationary settings, with guarantees on policy quality and regret, advancing the handling of non-stationarity in off-policy learning.

Findings

01

Outperforms state-of-the-art baselines on synthetic datasets.

02

Effective in real-world non-stationary environments.

03

Provides theoretical guarantees on policy quality and regret.

Abstract

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management