NeuCLIRTech: Chinese Monolingual and Cross-Language Information Retrieval Evaluation in a Challenging Domain
Dawn Lawrie, James Mayfield, Eugene Yang, Andrew Yates, Sean MacAvaney, Ronak Pradeep, Scott Miller, Paul McNamee, Luca Soldaini

TL;DR
NeuCLIRTech is a new evaluation dataset for Chinese monolingual and cross-language information retrieval, combining recent TREC NeuCLIR topics with machine-translated documents to facilitate system benchmarking in a challenging technical domain.
Contribution
The paper introduces NeuCLIRTech, a comprehensive Chinese retrieval evaluation collection with relevance judgments for monolingual and cross-language scenarios, integrating recent TREC topics and neural baselines.
Findings
Strong statistical power in distinguishing retrieval approaches
Inclusion of neural reranking baselines
Support for monolingual and cross-language retrieval scenarios
Abstract
Measuring advances in retrieval requires test collections with relevance judgments that can faithfully distinguish systems. This paper presents NeuCLIRTech, an evaluation collection for cross-language retrieval over technical information. The collection consists of technical documents written natively in Chinese and those same documents machine translated into English. It includes 110 queries with relevance judgments. The collection supports two retrieval scenarios: monolingual retrieval in Chinese, and cross-language retrieval with English as the query language. NeuCLIRTech combines the TREC NeuCLIR track topics of 2023 and 2024. The 110 queries with 35,962 document judgments provide strong statistical discriminatory power when trying to distinguish retrieval approaches. A fusion baseline of strong neural retrieval systems is included so that developers of reranking algorithms are not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Biomedical Text Mining and Ontologies
