Data-Driven Relevance Judgments for Ranking Evaluation

Nuno Moniz; Lu\'is Torgo; Jo\~ao Vinagre

arXiv:1612.06136·cs.IR·December 20, 2016

Data-Driven Relevance Judgments for Ranking Evaluation

Nuno Moniz, Lu\'is Torgo, Jo\~ao Vinagre

PDF

Open Access

TL;DR

This paper introduces a data-driven method to improve ranking evaluation metrics by accounting for score divergence, leading to more accurate relevance judgments and better ranking assessments.

Contribution

It proposes a novel relevance function based on score divergence, enhancing the accuracy of ranking evaluation metrics like nDCG.

Findings

01

The proposed method provides more fine-grained ranking evaluations.

02

Standard nDCG often under- or over-estimates scores based on score divergence.

03

Synthetic and real-world data demonstrate the improved performance of the new approach.

Abstract

Ranking evaluation metrics are a fundamental element of design and improvement efforts in information retrieval. We observe that most popular metrics disregard information portrayed in the scores used to derive rankings, when available. This may pose a numerical scaling problem, causing an under- or over-estimation of the evaluation depending on the degree of divergence between the scores of ranked items. The purpose of this work is to propose a principled way of quantifying multi-graded relevance judgments of items and enable a more accurate penalization of ordering errors in rankings. We propose a data-driven generation of relevance functions based on the degree of the divergence amongst a set of items' scores and its application in the evaluation metric Normalized Discounted Cumulative Gain (nDCG). We use synthetic data to demonstrate the interest of our proposal and a combination of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Data Management and Algorithms · Text and Document Classification Technologies