TL;DR
LexBoost enhances lexical document retrieval by integrating dense neighbor information into ranking, achieving improved effectiveness with minimal additional computational cost, and outperforming traditional dense re-ranking methods.
Contribution
The paper introduces LexBoost, a novel method that combines lexical relevance with dense neighbor information to improve retrieval effectiveness efficiently.
Findings
LexBoost outperforms traditional dense re-ranking methods.
The approach is robust across different parameters and dataset constructions.
LexBoost achieves effectiveness comparable to exhaustive dense retrieval with less latency.
Abstract
Sparse retrieval methods like BM25 are based on lexical overlap, focusing on the surface form of the terms that appear in the query and the document. The use of inverted indices in these methods leads to high retrieval efficiency. On the other hand, dense retrieval methods are based on learned dense vectors and, consequently, are effective but comparatively slow. Since sparse and dense methods approach problems differently and use complementary relevance signals, approximation methods were proposed to balance effectiveness and efficiency. For efficiency, approximation methods like HNSW are frequently used to approximate exhaustive dense retrieval. However, approximation techniques still exhibit considerably higher latency than sparse approaches. We propose LexBoost that first builds a network of dense neighbors (a corpus graph) using a dense retrieval approach while indexing. Then,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
