TL;DR
This paper introduces LogOpt, an online evaluation algorithm that optimizes logging policies for unbiased, efficient, and variance-minimized counterfactual ranking evaluation, outperforming traditional methods.
Contribution
The paper presents LogOpt, a novel online evaluation method that minimizes variance and ensures unbiased counterfactual estimates, improving over existing interleaving techniques.
Findings
LogOpt achieves unbiased online evaluation of ranking systems.
It reduces variance, leading to faster convergence.
Outperforms interleaving methods in large-scale simulations.
Abstract
Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns the counterfactual approach - which is indifferent to the logging policy - into an online approach, where the algorithm decides what rankings to display. We prove that, as an online evaluation method, LogOpt is unbiased w.r.t. position and item-selection bias, unlike existing interleaving methods. Furthermore, we perform large-scale experiments by simulating comparisons between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
