BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch   Language

Nikolay Banar; Ehsan Lotfi; Walter Daelemans

arXiv:2412.08329·cs.CL·December 12, 2024

BEIR-NL: Zero-shot Information Retrieval Benchmark for the Dutch Language

Nikolay Banar, Ehsan Lotfi, Walter Daelemans

PDF

Open Access 5 Datasets

TL;DR

BEIR-NL introduces a Dutch version of the BEIR benchmark by translating datasets, enabling zero-shot IR evaluation for Dutch, and compares various models, highlighting BM25's competitiveness and translation limitations.

Contribution

This work creates the first Dutch IR benchmark by translating BEIR datasets, facilitating zero-shot evaluation and analysis of IR models in Dutch.

Findings

01

BM25 remains a strong baseline in Dutch IR.

02

Dense models outperform BM25 but are not always significantly better.

03

Translation impacts dataset quality and model performance.

Abstract

Zero-shot evaluation of information retrieval (IR) models is often performed using BEIR; a large and heterogeneous benchmark composed of multiple datasets, covering different retrieval tasks across various domains. Although BEIR has become a standard benchmark for the zero-shot setup, its exclusively English content reduces its utility for underrepresented languages in IR, including Dutch. To address this limitation and encourage the development of Dutch IR models, we introduce BEIR-NL by automatically translating the publicly accessible BEIR datasets into Dutch. Using BEIR-NL, we evaluated a wide range of multilingual dense ranking and reranking models, as well as the lexical BM25 method. Our experiments show that BM25 remains a competitive baseline, and is only outperformed by the larger dense models trained for retrieval. When combined with reranking models, BM25 achieves performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Information Retrieval and Search Behavior · Data Quality and Management