$\Delta\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

Olivier Jeunen; Aleksei Ustimenko

arXiv:2405.10024·cs.LG·September 17, 2024

$\Delta\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies

Olivier Jeunen, Aleksei Ustimenko

PDF

Open Access 1 Repo

TL;DR

This paper introduces $ ext{-}{ m OPE}$, a pairwise off-policy estimation method that reduces variance in policy value difference estimation, improving offline evaluation and learning in recommendation systems.

Contribution

The paper proposes $ ext{-}{ m OPE}$, a novel pairwise off-policy estimation framework that leverages covariance between policies to reduce variance and enhance efficiency.

Findings

01

$ ext{-}{ m OPE}$ improves estimation accuracy in simulations and real experiments.

02

Variance reduction leads to better policy evaluation and learning outcomes.

03

The method outperforms traditional estimators in offline and online settings.

Abstract

The off-policy paradigm casts recommendation as a counterfactual decision-making task, allowing practitioners to unbiasedly estimate online metrics using offline data. This leads to effective evaluation metrics, as well as learning procedures that directly optimise online success. Nevertheless, the high variance that comes with unbiasedness is typically the crux that complicates practical applications. An important insight is that the difference between policy values can often be estimated with significantly reduced variance, if said policies have positive covariance. This allows us to formulate a pairwise off-policy estimation task: $Δ - OPE$ . $Δ - OPE$ subsumes the common use-case of estimating improvements of a learnt policy over a production policy, using data collected by a stochastic logging policy. We introduce $Δ - OPE$ methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

olivierjeunen/delta-ope-recsys-2024
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWater resources management and optimization · Auction Theory and Applications · Economic Policies and Impacts