TL;DR
This paper introduces a new off-policy evaluation method for ranking policies that balances bias and variance by leveraging the cascade user behavior model, improving accuracy in real-world recommender systems.
Contribution
It proposes the Cascade Doubly Robust estimator, which is unbiased under broader conditions and reduces variance using a control variate, advancing ranking policy evaluation.
Findings
The estimator outperforms existing methods in synthetic data.
It achieves more accurate evaluations on real-world datasets.
It effectively balances bias and variance in ranking OPE.
Abstract
In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application to the ranking setting faces a critical variance issue due to the huge item space. To tackle this problem, previous studies introduce some assumptions on user behavior to make the combinatorial item space tractable. However, an unrealistic assumption may, in turn, cause serious bias. Therefore, appropriately controlling the bias-variance tradeoff by imposing a reasonable assumption is the key for success in OPE of ranking policies. To achieve a well-balanced bias-variance tradeoff, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
