Multi-Dueling Bandits and Their Application to Online Ranker Evaluation
Brian Brost, Yevgeny Seldin, Ingemar J. Cox, Christina Lioma

TL;DR
This paper introduces a generalized dueling bandits model for online ranker evaluation, enabling simultaneous comparisons of multiple rankers, and demonstrates significant performance improvements over existing methods.
Contribution
We propose a new dueling bandits algorithm that compares multiple rankers simultaneously, addressing the open problem of selecting which rankers to compare at each iteration.
Findings
Significant performance improvements over state-of-the-art algorithms.
Effective evaluation on synthetic and real-world datasets.
Orders of magnitude faster convergence in identifying the best ranker.
Abstract
New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best. Online ranker evaluation can be modeled by dueling ban- dits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by interleaving their result sets and examining which documents users click on. The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration, thereby providing a solution to the exploration-exploitation trade-off. Recently, methods for simultaneously comparing more than two rankers have been developed. However, the question of which rankers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
