Meta Off-Policy Estimation

Olivier Jeunen

arXiv:2508.07914·stat.ML·August 12, 2025

Meta Off-Policy Estimation

Olivier Jeunen

PDF

TL;DR

This paper introduces a meta-analysis approach for off-policy evaluation in recommender systems, combining multiple estimators and their confidence intervals to produce more accurate and statistically efficient value estimates.

Contribution

It proposes a correlated fixed-effects meta-analysis framework to integrate multiple OPE estimators, accounting for their dependencies to improve estimation accuracy.

Findings

01

Improved statistical efficiency over individual estimators

02

Validated on simulated and real-world data

03

Produces conservative confidence intervals reflecting estimator dependencies

Abstract

Off-policy estimation (OPE) methods enable unbiased offline evaluation of recommender systems, directly estimating the online reward some target policy would have obtained, from offline data and with statistical guarantees. The theoretical elegance of the framework combined with practical successes have led to a surge of interest, with many competing estimators now available to practitioners and researchers. Among these, Doubly Robust methods provide a prominent strategy to combine value- and policy-based estimators. In this work, we take an alternative perspective to combine a set of OPE estimators and their associated confidence intervals into a single, more accurate estimate. Our approach leverages a correlated fixed-effects meta-analysis framework, explicitly accounting for dependencies among estimators that arise due to shared data. This yields a best linear unbiased estimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.