Integrating the Probabilistic Models BM25/BM25F into Lucene
Joaqu\'in P\'erez-Iglesias, Jos\'e R. P\'erez-Ag\"uera, V\'ictor, Fresno, Yuval Z. Feinstein

TL;DR
This paper details the implementation of BM25 and BM25F probabilistic retrieval models within the Lucene framework, highlighting their state-of-the-art performance in information retrieval tasks for unstructured and structured documents.
Contribution
It introduces the integration of BM25 and BM25F models into Lucene, enabling improved retrieval performance for both plain text and structured documents.
Findings
BM25 and BM25F are effective state-of-the-art IR models.
Implementation within Lucene facilitates practical use in search applications.
Models perform well on TREC benchmarks.
Abstract
This document describes the BM25 and BM25F implementation using the Lucene Java Framework. Both models have stood out at TREC by their performance and are considered as state-of-the-art in the IR community. BM25 is applied to retrieval on plain text documents, that is for documents that do not contain fields, while BM25F is applied to documents with structure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Digital Humanities and Scholarship · Library Science and Information Systems
