TL;DR
This paper introduces the first policy-aware unbiased learning to rank method specifically designed for top-k rankings, effectively addressing bias in logged user interaction data and improving top-k metric optimization.
Contribution
It proposes a novel policy-aware counterfactual estimator for top-k rankings that remains unbiased under stochastic logging policies and extends traditional LTR methods for counterfactual learning.
Findings
Estimator performance is unaffected by the size of k.
The method achieves the same retrieval performance from top-k feedback as from full ranking feedback.
Introduces the first policy-aware unbiased LTR approach for top-k settings.
Abstract
Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
