Evaluating D-MERIT of Partial-annotation on Information Retrieval

Royi Rassin; Yaron Fairstein; Oren Kalinsky; Guy Kushilevitz; Nachshon; Cohen; Alexander Libov; Yoav Goldberg

arXiv:2406.16048·cs.IR·October 15, 2024

Evaluating D-MERIT of Partial-annotation on Information Retrieval

Royi Rassin, Yaron Fairstein, Oren Kalinsky, Guy Kushilevitz, Nachshon, Cohen, Alexander Libov, Yoav Goldberg

PDF

Open Access

TL;DR

This paper demonstrates that partial annotations in retrieval evaluation can lead to misleading system rankings and introduces D-MERIT, a comprehensive dataset to improve evaluation reliability.

Contribution

The study highlights the impact of partial annotations on retrieval evaluation and provides D-MERIT, a dataset with more complete relevance annotations for better assessment.

Findings

01

Partial annotations can distort retrieval system rankings.

02

Including more relevant passages in evaluation sets improves ranking stability.

03

D-MERIT dataset offers a more comprehensive evaluation resource.

Abstract

Retrieval models are often evaluated on partially-annotated datasets. Each query is mapped to a few relevant texts and the remaining corpus is assumed to be irrelevant. As a result, models that successfully retrieve false negatives are punished in evaluation. Unfortunately, completely annotating all texts for every query is not resource efficient. In this work, we show that using partially-annotated datasets in evaluation can paint a distorted picture. We curate D-MERIT, a passage retrieval evaluation set from Wikipedia, aspiring to contain all relevant passages for each query. Queries describe a group (e.g., "journals about linguistics") and relevant passages are evidence that entities belong to the group (e.g., a passage indicating that "Language" is a journal about linguistics). We show that evaluating on a dataset containing annotations for only a subset of the relevant passages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training