Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
Leonid Boytsov, David Novak, Yury Malkov, Eric Nyberg

TL;DR
This paper proposes replacing traditional term-based retrieval with an approximate k-NN search that considers subtle term associations, significantly improving retrieval effectiveness and efficiency.
Contribution
It introduces a generic approximate k-NN retrieval algorithm that outperforms term-based search in both speed and accuracy, enabling more effective retrieval pipelines.
Findings
Approximate k-NN is nearly 100x faster than exact search.
The k-NN approach captures subtle term associations missed by term-based methods.
Retrieval pipelines using k-NN outperform traditional term-based pipelines.
Abstract
Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Topic Modeling
