TL;DR
This paper introduces a theoretical framework for evaluating multileaved comparison methods in ranking systems, and proposes PPM, a new method that is both considerate of user experience and reliable, demonstrating improved sensitivity and scalability.
Contribution
It provides a systematic framework for comparing multileaved methods and introduces PPM, a novel approach with proven considerateness and fidelity.
Findings
PPM is more sensitive to user preferences.
PPM scales better with the number of rankers.
PPM maintains user experience during evaluation.
Abstract
Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
