Cross-Validated Off-Policy Evaluation
Matej Cief, Branislav Kveton, Michal Kompan

TL;DR
This paper demonstrates how cross-validation can be effectively used for off-policy evaluation, providing practical guidance and empirical evidence to improve estimator selection and hyper-parameter tuning in this context.
Contribution
It introduces a novel approach to applying cross-validation in off-policy evaluation, challenging the belief that it is infeasible and offering practical tools for practitioners.
Findings
Cross-validation improves estimator selection in off-policy evaluation.
The proposed method performs well across various use cases.
Empirical results validate the effectiveness of the approach.
Abstract
We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation in off-policy evaluation is not feasible. We evaluate our method empirically and show that it addresses a variety of use cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEvaluation and Performance Assessment
