NeuCLIRBench: A Modern Evaluation Collection for Monolingual, Cross-Language, and Multilingual Information Retrieval

Dawn Lawrie; James Mayfield; Eugene Yang; Andrew Yates; Sean MacAvaney; Ronak Pradeep; Scott Miller; Paul McNamee; Luca Soldani

arXiv:2511.14758·cs.IR·November 19, 2025

NeuCLIRBench: A Modern Evaluation Collection for Monolingual, Cross-Language, and Multilingual Information Retrieval

Dawn Lawrie, James Mayfield, Eugene Yang, Andrew Yates, Sean MacAvaney, Ronak Pradeep, Scott Miller, Paul McNamee, Luca Soldani

PDF

Open Access 1 Datasets

TL;DR

NeuCLIRBench is a comprehensive evaluation dataset for monolingual, cross-language, and multilingual information retrieval, incorporating documents in Chinese, Persian, Russian, and English, with extensive relevance judgments for robust system comparison.

Contribution

It introduces a new multilingual test collection combining multiple languages, retrieval scenarios, and strong neural baselines, enhancing evaluation capabilities for retrieval systems.

Findings

01

Supports diverse retrieval scenarios including monolingual, cross-language, and multilingual tasks.

02

Contains over 250,000 relevance judgments across approximately 150 queries.

03

Includes a strong neural retrieval baseline for improved system evaluation.

Abstract

To measure advances in retrieval, test collections with relevance judgments that can faithfully distinguish systems are required. This paper presents NeuCLIRBench, an evaluation collection for cross-language and multilingual retrieval. The collection consists of documents written natively in Chinese, Persian, and Russian, as well as those same documents machine translated into English. The collection supports several retrieval scenarios including: monolingual retrieval in English, Chinese, Persian, or Russian; cross-language retrieval with English as the query language and one of the other three languages as the document language; and multilingual retrieval, again with English as the query language and relevant documents in all three languages. NeuCLIRBench combines the TREC NeuCLIR track topics of 2022, 2023, and 2024. The 250,128 judgments across approximately 150 queries for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

neuclir/bench
dataset· 772 dl
772 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Biomedical Text Mining and Ontologies