Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4   English Subtasks

Tetsuya Sakai; Sijie Tao; Maria Maistro; Zhumin Chu; Yujing Li; Nuo; Chen; Nicola Ferro; Junjie Wang; Ian Soboroff; and Yiqun Liu

arXiv:2210.10266·cs.IR·October 20, 2022

Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks

Tetsuya Sakai, Sijie Tao, Maria Maistro, Zhumin Chu, Yujing Li, Nuo, Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, and Yiqun Liu

PDF

Open Access

TL;DR

This paper corrects the evaluation results of the NTCIR WWW-2, WWW-3, and WWW-4 tasks by fixing a bug in the relevance assessment interface, providing accurate results for future research and comparisons.

Contribution

It identifies and corrects a critical bug in the relevance labels used in NTCIR WWW tasks, ensuring accurate evaluation results for these benchmarks.

Findings

01

Corrected relevance labels for WWW-2, WWW-3, and WWW-4

02

Revealed the impact of the bug on previous evaluation results

03

Provided accurate benchmark results for future research

Abstract

Unfortunately, the official English (sub)task results reported in the NTCIR-14 WWW-2, NTCIR-15 WWW-3, and NTCIR-16 WWW-4 overview papers are incorrect due to noise in the official qrels files; this paper reports results based on the corrected qrels files. The noise is due to a fatal bug in the backend of our relevance assessment interface. More specifically, at WWW-2, WWW-3, and WWW-4, two versions of pool files were created for each English topic: a PRI ("prioritised") file, which uses the NTCIRPOOL script to prioritise likely relevant documents, and a RND ("randomised") file, which randomises the pooled documents. This was done for the purpose of studying the effect of document ordering for relevance assessors. However, the programmer who wrote the interface backend assumed that a combination of a topic ID and a document rank in the pool file uniquely determines a document ID; this is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Semantic Web and Ontologies

MethodsTest