mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark
Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo, Nogueira

TL;DR
This paper introduces mRobust04, a multilingual extension of the Robust 2004 information retrieval benchmark, translated into eight languages, and evaluates three multilingual retrieval methods on it.
Contribution
It presents the first multilingual version of Robust04, enabling evaluation of multilingual retrieval systems across eight languages.
Findings
Multilingual retrievers show varying performance across languages.
The dataset is publicly available for further research.
Translation with Google Translate effectively creates a multilingual benchmark.
Abstract
Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset. In this paper, we present mRobust04, a multilingual version of Robust04 that was translated to 8 languages using Google Translate. We also provide results of three different multilingual retrievers on this dataset. The dataset is available at https://huggingface.co/datasets/unicamp-dl/mrobust
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Image Retrieval and Classification Techniques · Semantic Web and Ontologies
