Relevance Judgment Convergence Degree -- A Measure of Inconsistency among Assessors for Information Retrieval
Dengya Zhu, Shastri L Nimmagadda, Kok Wai Wong, Torsten Reiners

TL;DR
This paper introduces the Relevance Judgment Convergence Degree (RJCD), a metric to quantify the inconsistency among assessors' relevance judgments, which impacts the evaluation of IR systems, and demonstrates its strong correlation with system performance differences.
Contribution
The paper proposes a new metric, RJCD, to measure assessor judgment inconsistency and shows its effectiveness in evaluating IR system performance.
Findings
RJCD correlates strongly with IR system performance differences.
Inconsistency among assessors affects IR evaluation outcomes.
RJCD can serve as a quality measure for relevance judgments.
Abstract
Relevance judgment of human assessors is inherently subjective and dynamic when evaluation datasets are created for Information Retrieval (IR) systems. However, a small group of experts' relevance judgment results are usually taken as ground truth to "objectively" evaluate the performance of the IR systems. Recent trends intend to employ a group of judges, such as outsourcing, to alleviate the potentially biased judgment results stemmed from using only a single expert's judgment. Nevertheless, different judges may have different opinions and may not agree with each other, and the inconsistency in human relevance judgment may affect the IR system evaluation results. In this research, we introduce a Relevance Judgment Convergence Degree (RJCD) to measure the quality of queries in the evaluation datasets. Experimental results reveal a strong correlation coefficient between the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Expert finding and Q&A systems · Recommender Systems and Techniques
