Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

Olivier Jeunen; Shashank Gupta

arXiv:2602.14914·cs.LG·April 28, 2026

Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation

Olivier Jeunen, Shashank Gupta

PDF

TL;DR

This paper proves that additive control variates outperform self-normalisation in off-policy evaluation, providing theoretical justification for shifting to optimal baseline corrections.

Contribution

It establishes that $eta^ extstar$-IPS with an optimal additive baseline asymptotically dominates SNIPS in mean squared error, filling a theoretical gap.

Findings

01

$eta^ extstar$-IPS asymptotically dominates SNIPS in MSE

02

SNIPS is equivalent to a sub-optimal additive baseline

03

Theoretical analysis justifies using optimal baseline corrections

Abstract

Off-policy evaluation (OPE) is essential for assessing ranking and recommendation systems without costly online interventions. Self-Normalised Inverse Propensity Scoring (SNIPS) is a standard tool for variance reduction in OPE, leveraging a multiplicative control variate. Recent advances in off-policy learning suggest that additive control variates (baseline corrections) may offer superior performance, yet theoretical guarantees for evaluation are lacking. This paper provides a definitive answer: we prove that $β^{⋆}$ -IPS, an estimator with an optimal additive baseline, asymptotically dominates SNIPS in Mean Squared Error. By analytically decomposing the variance gap, we show that SNIPS is asymptotically equivalent to using a specific -- but generally sub-optimal -- additive baseline. Our results theoretically justify shifting from self-normalisation to optimal baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.