Corrected Evaluation Results of the NTCIR WWW-2, WWW-3, and WWW-4 English Subtasks
Tetsuya Sakai, Sijie Tao, Maria Maistro, Zhumin Chu, Yujing Li, Nuo, Chen, Nicola Ferro, Junjie Wang, Ian Soboroff, and Yiqun Liu

TL;DR
This paper corrects the evaluation results of the NTCIR WWW-2, WWW-3, and WWW-4 tasks by fixing a bug in the relevance assessment interface, providing accurate results for future research and comparisons.
Contribution
It identifies and corrects a critical bug in the relevance labels used in NTCIR WWW tasks, ensuring accurate evaluation results for these benchmarks.
Findings
Corrected relevance labels for WWW-2, WWW-3, and WWW-4
Revealed the impact of the bug on previous evaluation results
Provided accurate benchmark results for future research
Abstract
Unfortunately, the official English (sub)task results reported in the NTCIR-14 WWW-2, NTCIR-15 WWW-3, and NTCIR-16 WWW-4 overview papers are incorrect due to noise in the official qrels files; this paper reports results based on the corrected qrels files. The noise is due to a fatal bug in the backend of our relevance assessment interface. More specifically, at WWW-2, WWW-3, and WWW-4, two versions of pool files were created for each English topic: a PRI ("prioritised") file, which uses the NTCIRPOOL script to prioritise likely relevant documents, and a RND ("randomised") file, which randomises the pooled documents. This was done for the purpose of studying the effect of document ordering for relevance assessors. However, the programmer who wrote the interface backend assumed that a combination of a topic ID and a document rank in the pool file uniquely determines a document ID; this is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Semantic Web and Ontologies
MethodsTest
