Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak, Pradeep, and Rodrigo Nogueira

TL;DR
Pyserini is a user-friendly Python toolkit that facilitates replicable information retrieval research by supporting various retrieval methods, including sparse, dense, and hybrid approaches, with comprehensive resources and evaluation tools.
Contribution
The paper introduces Pyserini, a comprehensive Python toolkit that simplifies IR research and ensures reproducibility through pre-built resources and support for multiple retrieval techniques.
Findings
Effective on two popular ranking tasks
Supports sparse, dense, and hybrid retrieval methods
Enables rigorous automated testing for reproducibility
Abstract
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
