LexBoost: Improving Lexical Document Retrieval with Nearest Neighbors

Hrishikesh Kulkarni; Nazli Goharian; Ophir Frieder; Sean MacAvaney

arXiv:2409.05882·cs.IR·September 11, 2024

LexBoost: Improving Lexical Document Retrieval with Nearest Neighbors

Hrishikesh Kulkarni, Nazli Goharian, Ophir Frieder, Sean MacAvaney

PDF

1 Repo

TL;DR

LexBoost enhances lexical document retrieval by integrating dense neighbor information into ranking, achieving improved effectiveness with minimal additional computational cost, and outperforming traditional dense re-ranking methods.

Contribution

The paper introduces LexBoost, a novel method that combines lexical relevance with dense neighbor information to improve retrieval effectiveness efficiently.

Findings

01

LexBoost outperforms traditional dense re-ranking methods.

02

The approach is robust across different parameters and dataset constructions.

03

LexBoost achieves effectiveness comparable to exhaustive dense retrieval with less latency.

Abstract

Sparse retrieval methods like BM25 are based on lexical overlap, focusing on the surface form of the terms that appear in the query and the document. The use of inverted indices in these methods leads to high retrieval efficiency. On the other hand, dense retrieval methods are based on learned dense vectors and, consequently, are effective but comparatively slow. Since sparse and dense methods approach problems differently and use complementary relevance signals, approximation methods were proposed to balance effectiveness and efficiency. For efficiency, approximation methods like HNSW are frequently used to approximate exhaustive dense retrieval. However, approximation techniques still exhibit considerably higher latency than sparse approaches. We propose LexBoost that first builds a network of dense neighbors (a corpus graph) using a dense retrieval approach while indexing. Then,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Georgetown-IR-Lab/LexBoost
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.