Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard
Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma,, Jheng-Hong Yang, Jimmy Lin

TL;DR
This paper introduces resources for the BEIR benchmark, including reproducible reference implementations of retrieval models and an official leaderboard, to enhance reproducibility, comparability, and research in zero-shot information retrieval across diverse domains.
Contribution
It provides reproducible reference implementations for dense and sparse retrieval models and establishes an official BEIR leaderboard for consistent model evaluation.
Findings
Reproducible implementations ease entry for new researchers.
The leaderboard enables fair comparison of retrieval models.
Facilitates future research in domain-specific information retrieval.
Abstract
BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations. In recent years, we have witnessed the growing popularity of a representation learning approach to building retrieval models, typically using pretrained transformers in a supervised setting. This naturally begs the question: How effective are these models when presented with queries and documents that differ from the training data? Examples include searching in different domains (e.g., medical or legal text) and with different types of queries (e.g., keywords vs. well-formed questions). While BEIR was designed to answer these questions, our work addresses two shortcomings that prevent the benchmark from achieving its full potential: First, the sophistication of modern neural methods and the complexity of current software infrastructure create barriers to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
