End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene
Haonan Chen, Carlos Lassance, and Jimmy Lin

TL;DR
This paper demonstrates that Lucene can be used to perform end-to-end retrieval with modern dense and sparse neural representations efficiently on CPU, simplifying the integration of such models into existing IR systems.
Contribution
It shows that Lucene alone is sufficient for supporting modern dense and sparse neural retrieval models with minimal infrastructure changes.
Findings
Effective retrieval with neural models can be done directly in Java on CPU.
Lucene-based implementation simplifies the integration of dense and sparse representations.
The approach supports both research and production IR systems efficiently.
Abstract
The bi-encoder architecture provides a framework for understanding machine-learned retrieval models based on dense and sparse vector representations. Although these representations capture parametric realizations of the same underlying conceptual framework, their respective implementations of top- similarity search require the coordination of different software components (e.g., inverted indexes, HNSW indexes, and toolkits for neural inference), often knitted together in complex architectures. In this work, we ask the following question: What's the simplest design, in terms of requiring the fewest changes to existing infrastructure, that can support end-to-end retrieval with modern dense and sparse representations? The answer appears to be that Lucene is sufficient, as we demonstrate in Anserini, a toolkit for reproducible information retrieval research. That is, effective retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Neural Networks and Applications · Machine Learning in Materials Science
