An Improved Multileaving Algorithm for Online Ranker Evaluation

Brian Brost; Ingemar J. Cox; Yevgeny Seldin; Christina Lioma

arXiv:1608.00788·cs.IR·August 3, 2016

An Improved Multileaving Algorithm for Online Ranker Evaluation

Brian Brost, Ingemar J. Cox, Yevgeny Seldin, Christina Lioma

PDF

Open Access

TL;DR

This paper introduces an improved multileaving algorithm for online ranker evaluation that better accounts for ranker similarities, leading to more accurate and scalable preference inference from user feedback.

Contribution

The paper proposes a novel multileaving method that addresses scalability issues and inaccuracies caused by ranker similarities, outperforming existing methods.

Findings

01

Reduces evaluation errors by up to 50%

02

Improves scalability with multiple rankers

03

Produces results more aligned with NDCG measures

Abstract

Online ranker evaluation is a key challenge in information retrieval. An important task in the online evaluation of rankers is using implicit user feedback for inferring preferences between rankers. Interleaving methods have been found to be efficient and sensitive, i.e. they can quickly detect even small differences in quality. It has recently been shown that multileaving methods exhibit similar sensitivity but can be more efficient than interleaving methods. This paper presents empirical results demonstrating that existing multileaving methods either do not scale well with the number of rankers, or, more problematically, can produce results which substantially differ from evaluation measures like NDCG. The latter problem is caused by the fact that they do not correctly account for the similarities that can occur between rankers being multileaved. We propose a new multileaving method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Information Retrieval and Search Behavior · Recommender Systems and Techniques