Shallow pooling for sparse labels
Negar Arabzadeh, Alexandra Vtyurina, Xinyi Yan, Charles L. A., Clarke

TL;DR
This paper investigates the limitations of current evaluation metrics for neural rankers on sparse relevance datasets like MS MARCO, revealing that top-ranked results often outperform judged relevant items, which challenges the validity of existing benchmarks.
Contribution
It introduces a crowdsourced preference judgment approach to compare top-ranked results with judged relevant items, highlighting potential flaws in current evaluation practices.
Findings
Top results from neural rankers often outperform judged relevant items.
Current datasets may not accurately reflect true ranking quality.
Preference judgments suggest the need for better evaluation standards.
Abstract
Recent years have seen enormous gains in core IR tasks, including document and passage ranking. Datasets and leaderboards, and in particular the MS MARCO datasets, illustrate the dramatic improvements achieved by modern neural rankers. When compared with traditional test collections, the MS MARCO datasets employ substantially more queries with substantially fewer known relevant items per query. Given the sparsity of these relevance labels, the MS MARCO leaderboards track improvements with mean reciprocal rank (MRR). In essence, a relevant item is treated as the "right answer", with rankers scored on their ability to place this item high in the ranking. In working with these sparse labels, we have observed that the top items returned by a ranker often appear superior to judged relevant items. To test this observation, we employed crowdsourced workers to make preference judgments between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Stochastic Gradient Optimization Techniques
