Additive Control Variates Dominate Self-Normalisation in Off-Policy Evaluation
Olivier Jeunen, Shashank Gupta

TL;DR
This paper proves that additive control variates outperform self-normalisation in off-policy evaluation, providing theoretical justification for shifting to optimal baseline corrections.
Contribution
It establishes that $eta^ extstar$-IPS with an optimal additive baseline asymptotically dominates SNIPS in mean squared error, filling a theoretical gap.
Findings
$eta^ extstar$-IPS asymptotically dominates SNIPS in MSE
SNIPS is equivalent to a sub-optimal additive baseline
Theoretical analysis justifies using optimal baseline corrections
Abstract
Off-policy evaluation (OPE) is essential for assessing ranking and recommendation systems without costly online interventions. Self-Normalised Inverse Propensity Scoring (SNIPS) is a standard tool for variance reduction in OPE, leveraging a multiplicative control variate. Recent advances in off-policy learning suggest that additive control variates (baseline corrections) may offer superior performance, yet theoretical guarantees for evaluation are lacking. This paper provides a definitive answer: we prove that -IPS, an estimator with an optimal additive baseline, asymptotically dominates SNIPS in Mean Squared Error. By analytically decomposing the variance gap, we show that SNIPS is asymptotically equivalent to using a specific -- but generally sub-optimal -- additive baseline. Our results theoretically justify shifting from self-normalisation to optimal baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
