
TL;DR
This paper introduces a meta-analysis approach for off-policy evaluation in recommender systems, combining multiple estimators and their confidence intervals to produce more accurate and statistically efficient value estimates.
Contribution
It proposes a correlated fixed-effects meta-analysis framework to integrate multiple OPE estimators, accounting for their dependencies to improve estimation accuracy.
Findings
Improved statistical efficiency over individual estimators
Validated on simulated and real-world data
Produces conservative confidence intervals reflecting estimator dependencies
Abstract
Off-policy estimation (OPE) methods enable unbiased offline evaluation of recommender systems, directly estimating the online reward some target policy would have obtained, from offline data and with statistical guarantees. The theoretical elegance of the framework combined with practical successes have led to a surge of interest, with many competing estimators now available to practitioners and researchers. Among these, Doubly Robust methods provide a prominent strategy to combine value- and policy-based estimators. In this work, we take an alternative perspective to combine a set of OPE estimators and their associated confidence intervals into a single, more accurate estimate. Our approach leverages a correlated fixed-effects meta-analysis framework, explicitly accounting for dependencies among estimators that arise due to shared data. This yields a best linear unbiased estimate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
