LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval
Yingrui Yang, Parker Carlson, Yifan Qiao, Wentai Xie, Shanxiu He, and, Tao Yang

TL;DR
This paper introduces CluSD, a fast and memory-efficient method that combines dense and sparse retrieval techniques using clustering and LSTM guidance to improve search efficiency on large datasets.
Contribution
It proposes a novel cluster-based selective dense retrieval approach guided by sparse lexical retrieval, enhancing speed and reducing memory overhead.
Findings
CluSD outperforms baseline methods in search speed and accuracy.
It effectively reduces memory usage during retrieval.
CluSD demonstrates strong performance on MS MARCO and BEIR datasets.
Abstract
This paper studies fast fusion of dense retrieval and sparse lexical retrieval, and proposes a cluster-based selective dense retrieval method called CluSD guided by sparse lexical retrieval. CluSD takes a lightweight cluster-based approach and exploits the overlap of sparse retrieval results and embedding clusters in a two-stage selection process with an LSTM model to quickly identify relevant clusters while incurring limited extra memory space overhead. CluSD triggers partial dense retrieval and performs cluster-based block disk I/O if needed. This paper evaluates CluSD and compares it with several baselines for searching in-memory and on-disk MS MARCO and BEIR datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Topic Modeling
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
