Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Takuma Udagawa; Haruka Kiyohara; Yusuke Narita; Yuta Saito; Kei Tateno

arXiv:2211.13904·cs.LG·January 31, 2023

Policy-Adaptive Estimator Selection for Off-Policy Evaluation

Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel data-driven method for selecting the most accurate off-policy evaluation estimator by adaptively subsampling logged data and constructing pseudo policies, significantly improving estimator accuracy.

Contribution

It presents the first approach for adaptive estimator selection in OPE, addressing the challenge of choosing the best estimator based solely on logged data.

Findings

01

Substantially improves estimator selection accuracy over non-adaptive methods.

02

Effective on both synthetic and real-world datasets.

03

Demonstrates significant gains in OPE performance.

Abstract

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level. Thus, the data-driven estimator selection problem is becoming increasingly important and can have a significant impact on the accuracy of OPE. However, identifying the most accurate estimator using only the logged data is quite challenging because the ground-truth estimation accuracy of estimators is generally unavailable. This paper studies this challenging problem of estimator selection for OPE for the first time. In particular, we enable an estimator selection that is adaptive to a given OPE task, by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sony/ds-research-code
pytorchOfficial

Videos

Policy-Adaptive Estimator Selection for Off-Policy Evaluation· underline

Taxonomy

TopicsAdvanced Causal Inference Techniques · Machine Learning and Algorithms · Efficiency Analysis Using DEA