Data-Driven Relevance Judgments for Ranking Evaluation
Nuno Moniz, Lu\'is Torgo, Jo\~ao Vinagre

TL;DR
This paper introduces a data-driven method to improve ranking evaluation metrics by accounting for score divergence, leading to more accurate relevance judgments and better ranking assessments.
Contribution
It proposes a novel relevance function based on score divergence, enhancing the accuracy of ranking evaluation metrics like nDCG.
Findings
The proposed method provides more fine-grained ranking evaluations.
Standard nDCG often under- or over-estimates scores based on score divergence.
Synthetic and real-world data demonstrate the improved performance of the new approach.
Abstract
Ranking evaluation metrics are a fundamental element of design and improvement efforts in information retrieval. We observe that most popular metrics disregard information portrayed in the scores used to derive rankings, when available. This may pose a numerical scaling problem, causing an under- or over-estimation of the evaluation depending on the degree of divergence between the scores of ranked items. The purpose of this work is to propose a principled way of quantifying multi-graded relevance judgments of items and enable a more accurate penalization of ordering errors in rankings. We propose a data-driven generation of relevance functions based on the degree of the divergence amongst a set of items' scores and its application in the evaluation metric Normalized Discounted Cumulative Gain (nDCG). We use synthetic data to demonstrate the interest of our proposal and a combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Data Management and Algorithms · Text and Document Classification Technologies
