BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language
Nikolay Banar, Ehsan Lotfi, Walter Daelemans

TL;DR
BEIR-NL introduces a Dutch version of the BEIR benchmark by translating datasets, enabling zero-shot IR evaluation for Dutch, and compares various models, highlighting BM25's competitiveness and translation limitations.
Contribution
This work creates the first Dutch IR benchmark by translating BEIR datasets, facilitating zero-shot evaluation and analysis of IR models in Dutch.
Findings
BM25 remains a strong baseline in Dutch IR.
Dense models outperform BM25 but are not always significantly better.
Translation impacts dataset quality and model performance.
Abstract
Zero-shot evaluation of information retrieval (IR) models is often performed using BEIR; a large and heterogeneous benchmark composed of multiple datasets, covering different retrieval tasks across various domains. Although BEIR has become a standard benchmark for the zero-shot setup, its exclusively English content reduces its utility for underrepresented languages in IR, including Dutch. To address this limitation and encourage the development of Dutch IR models, we introduce BEIR-NL by automatically translating the publicly accessible BEIR datasets into Dutch. Using BEIR-NL, we evaluated a wide range of multilingual dense ranking and reranking models, as well as the lexical BM25 method. Our experiments show that BM25 remains a competitive baseline, and is only outperformed by the larger dense models trained for retrieval. When combined with reranking models, BM25 achieves performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Information Retrieval and Search Behavior · Data Quality and Management
