Inverse Propensity Score based offline estimator for deterministic ranking lists using position bias
Nick Wood, Sumit Sidana

TL;DR
This paper introduces a new inverse propensity score estimator for deterministic ranking lists that accounts for position bias, enabling more effective offline policy evaluation with industry-scale data.
Contribution
It proposes a novel IPS-based estimator tailored for deterministic policies using position bias modeling, expanding the applicability of offline policy evaluation.
Findings
Strong correlation between offline and online results
Estimator performs well with accurate user behavior models
Validated on industry-scale data
Abstract
In this work, we present a novel way of computing IPS using a position-bias model for deterministic logging policies. This technique significantly widens the policies on which OPE can be used. We validate this technique using two different experiments on industry-scale data. The OPE results are clearly strongly correlated with the online results, with some constant bias. The estimator requires the examination model to be a reasonably accurate approximation of real user behaviour.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Auction Theory and Applications · Recommender Systems and Techniques
