Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR   Research with Sparse and Dense Representations

Jimmy Lin; Xueguang Ma; Sheng-Chieh Lin; Jheng-Hong Yang; Ronak; Pradeep; and Rodrigo Nogueira

arXiv:2102.10073·cs.IR·February 22, 2021·31 cites

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak, Pradeep, and Rodrigo Nogueira

PDF

Open Access 1 Repo

TL;DR

Pyserini is a user-friendly Python toolkit that facilitates replicable information retrieval research by supporting various retrieval methods, including sparse, dense, and hybrid approaches, with comprehensive resources and evaluation tools.

Contribution

The paper introduces Pyserini, a comprehensive Python toolkit that simplifies IR research and ensures reproducibility through pre-built resources and support for multiple retrieval techniques.

Findings

01

Effective on two popular ranking tasks

02

Supports sparse, dense, and hybrid retrieval methods

03

Enables rigorous automated testing for reproducibility

Abstract

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

castorini/pyserini
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications