Logging Policy Design for Off-Policy Evaluation

Connor Douglas; Joel Persson; Foster Provost

arXiv:2605.15108·stat.ML·May 18, 2026

Logging Policy Design for Off-Policy Evaluation

Connor Douglas, Joel Persson, Foster Provost

PDF

TL;DR

This paper investigates how to design logging policies that minimize off-policy evaluation error, balancing reward coverage and variance, with theoretical and practical guidance for recommendation systems.

Contribution

It introduces a unifying framework for logging policy design, deriving optimal policies under various informational regimes and providing practical principles for real-world implementation.

Findings

01

Characterized the reward-coverage tradeoff in logging policy design.

02

Derived optimal logging policies for known, unknown, and partially known reward distributions.

03

Provided practical guidelines for policy design under operational constraints.

Abstract

Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice accuracy depends heavily on the logging policy used to collect data for computing the estimate. We study how to design logging policies that minimize OPE error for given target policies. We characterize a fundamental reward-coverage tradeoff: concentrating probability mass on high-reward actions reduces variance but risks missing signal on actions the target policy may take. We propose a unifying framework for logging policy design and derive optimal policies in canonical informational regimes where the target policy and reward distribution are (i) known, (ii) unknown, and (iii) partially known through priors or noisy estimates at logging time. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.