Shallow pooling for sparse labels

Negar Arabzadeh; Alexandra Vtyurina; Xinyi Yan; Charles L. A.; Clarke

arXiv:2109.00062·cs.IR·March 2, 2022·1 cites

Shallow pooling for sparse labels

Negar Arabzadeh, Alexandra Vtyurina, Xinyi Yan, Charles L. A., Clarke

PDF

Open Access 2 Repos

TL;DR

This paper investigates the limitations of current evaluation metrics for neural rankers on sparse relevance datasets like MS MARCO, revealing that top-ranked results often outperform judged relevant items, which challenges the validity of existing benchmarks.

Contribution

It introduces a crowdsourced preference judgment approach to compare top-ranked results with judged relevant items, highlighting potential flaws in current evaluation practices.

Findings

01

Top results from neural rankers often outperform judged relevant items.

02

Current datasets may not accurately reflect true ranking quality.

03

Preference judgments suggest the need for better evaluation standards.

Abstract

Recent years have seen enormous gains in core IR tasks, including document and passage ranking. Datasets and leaderboards, and in particular the MS MARCO datasets, illustrate the dramatic improvements achieved by modern neural rankers. When compared with traditional test collections, the MS MARCO datasets employ substantially more queries with substantially fewer known relevant items per query. Given the sparsity of these relevance labels, the MS MARCO leaderboards track improvements with mean reciprocal rank (MRR). In essence, a relevant item is treated as the "right answer", with rankers scored on their ability to place this item high in the ranking. In working with these sparse labels, we have observed that the top items returned by a ranker often appear superior to judged relevant items. To test this observation, we employed crowdsourced workers to make preference judgments between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Stochastic Gradient Optimization Techniques