Offline A/B testing for Recommender Systems
Alexandre Gilotte, Cl\'ement Calauz\`enes, Thomas Nedelec, Alexandre, Abraham, Simon Doll\'e

TL;DR
This paper evaluates offline counterfactual estimators for recommender systems, identifying limitations of traditional methods and proposing improved variants that better predict online A/B test outcomes.
Contribution
It introduces two novel counterfactual estimators with improved bias-variance trade-offs for offline evaluation of recommender systems.
Findings
Traditional estimators show high bias or variance in personalized recommendations.
The proposed estimators demonstrate higher correlation with actual business metrics.
Benchmark results validate the effectiveness of the new estimators in real-world scenarios.
Abstract
Before A/B testing online a new version of a recommender system, it is usual to perform some offline evaluations on historical data. We focus on evaluation methods that compute an estimator of the potential uplift in revenue that could generate this new technology. It helps to iterate faster and to avoid losing money by detecting poor policies. These estimators are known as counterfactual or off-policy estimators. We show that traditional counterfactual estimators such as capped importance sampling and normalised importance sampling are experimentally not having satisfying bias-variance compromises in the context of personalised product recommendation for online advertising. We propose two variants of counterfactual estimates with different modelling of the bias that prove to be accurate in real-world conditions. We provide a benchmark of these estimators by showing their correlation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
