LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical   Retrieval

Yingrui Yang; Parker Carlson; Yifan Qiao; Wentai Xie; Shanxiu He; and; Tao Yang

arXiv:2502.10639·cs.IR·February 18, 2025

LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval

Yingrui Yang, Parker Carlson, Yifan Qiao, Wentai Xie, Shanxiu He, and, Tao Yang

PDF

Open Access

TL;DR

This paper introduces CluSD, a fast and memory-efficient method that combines dense and sparse retrieval techniques using clustering and LSTM guidance to improve search efficiency on large datasets.

Contribution

It proposes a novel cluster-based selective dense retrieval approach guided by sparse lexical retrieval, enhancing speed and reducing memory overhead.

Findings

01

CluSD outperforms baseline methods in search speed and accuracy.

02

It effectively reduces memory usage during retrieval.

03

CluSD demonstrates strong performance on MS MARCO and BEIR datasets.

Abstract

This paper studies fast fusion of dense retrieval and sparse lexical retrieval, and proposes a cluster-based selective dense retrieval method called CluSD guided by sparse lexical retrieval. CluSD takes a lightweight cluster-based approach and exploits the overlap of sparse retrieval results and embedding clusters in a two-stage selection process with an LSTM model to quickly identify relevant clusters while incurring limited extra memory space overhead. CluSD triggers partial dense retrieval and performs cluster-based block disk I/O if needed. This paper evaluates CluSD and compares it with several baselines for searching in-memory and on-disk MS MARCO and BEIR datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Topic Modeling

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory