Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Harrie Oosterhuis; Maarten de Rijke

arXiv:1711.09454·cs.IR·November 28, 2017

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Harrie Oosterhuis, Maarten de Rijke

PDF

1 Repo

TL;DR

This paper introduces a theoretical framework for evaluating multileaved comparison methods in ranking systems, and proposes PPM, a new method that is both considerate of user experience and reliable, demonstrating improved sensitivity and scalability.

Contribution

It provides a systematic framework for comparing multileaved methods and introduces PPM, a novel approach with proven considerateness and fidelity.

Findings

01

PPM is more sensitive to user preferences.

02

PPM scales better with the number of rankers.

03

PPM maintains user experience during evaluation.

Abstract

Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and fidelity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HarrieO/PairwisePreferenceMultileave
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.