Off-Policy Evaluation and Learning for the Future under Non-Stationarity

Tatsuhiro Shimizu; Kazuki Kawamura; Takanori Muroi; Yusuke Narita; Kei Tateno; Takuma Udagawa; Yuta Saito

arXiv:2506.20417·cs.LG·June 26, 2025

Off-Policy Evaluation and Learning for the Future under Non-Stationarity

Tatsuhiro Shimizu, Kazuki Kawamura, Takanori Muroi, Yusuke Narita, Kei Tateno, Takuma Udagawa, Yuta Saito

PDF

Open Access

TL;DR

This paper introduces OPFV, a novel importance-weighted estimator for accurately evaluating and optimizing policies in non-stationary environments by leveraging time-series structures, addressing limitations of existing methods.

Contribution

The paper proposes OPFV, the first estimator to exploit temporal structures for future off-policy evaluation and learning in non-stationary settings, with theoretical and empirical validation.

Findings

01

OPFV outperforms existing methods in estimating future policy value.

02

The approach effectively leverages seasonal and temporal patterns.

03

The method enables proactive policy optimization in changing environments.

Abstract

We study the novel problem of future off-policy evaluation (F-OPE) and learning (F-OPL) for estimating and optimizing the future value of policies in non-stationary environments, where distributions vary over time. In e-commerce recommendations, for instance, our goal is often to estimate and optimize the policy value for the upcoming month using data collected by an old policy in the previous month. A critical challenge is that data related to the future environment is not observed in the historical data. Existing methods assume stationarity or depend on restrictive reward-modeling assumptions, leading to significant bias. To address these limitations, we propose a novel estimator named \textit{\textbf{O}ff-\textbf{P}olicy Estimator for the \textbf{F}uture \textbf{V}alue (\textbf{\textit{OPFV}})}, designed for accurately estimating policy values at any future time point. The key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsClimate Change Policy and Economics