Off-policy evaluation for learning-to-rank via interpolating the item-position model and the position-based model
Alexander Buchholz, Ben London, Giuseppe di Benedetto, Thorsten, Joachims

TL;DR
This paper introduces INTERPOL, a new off-policy evaluation estimator for learning-to-rank systems that balances bias and variance by interpolating existing models, improving offline policy assessment accuracy.
Contribution
The paper proposes a novel estimator called INTERPOL that mitigates bias and variance issues in existing off-policy ranking evaluation methods, with theoretical and empirical validation.
Findings
INTERPOL reduces bias in off-policy evaluation.
It achieves a better bias-variance trade-off than existing models.
Empirical results demonstrate improved evaluation accuracy.
Abstract
A critical need for industrial recommender systems is the ability to evaluate recommendation policies offline, before deploying them to production. Unfortunately, widely used off-policy evaluation methods either make strong assumptions about how users behave that can lead to excessive bias, or they make fewer assumptions and suffer from large variance. We tackle this problem by developing a new estimator that mitigates the problems of the two most popular off-policy estimators for rankings, namely the position-based model and the item-position model. In particular, the new estimator, called INTERPOL, addresses the bias of a potentially misspecified position-based model, while providing an adaptable bias-variance trade-off compared to the item-position model. We provide theoretical arguments as well as empirical results that highlight the performance of our novel estimation approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Optimization and Search Problems
