A Study on the Efficiency and Generalization of Light Hybrid Retrievers
Man Luo, Shashank Jain, Anchit Gupta, Arash Einolghozati, Barlas Oguz,, Debojeet Chatterjee, Xilun Chen, Chitta Baral, Peyman Heidari

TL;DR
This paper introduces LITE, a memory-efficient hybrid retriever that combines sparse and dense methods, achieving high performance with significantly reduced memory usage and improved out-of-domain generalization.
Contribution
We propose LITE, a novel lightweight dense retriever trained with contrastive learning and knowledge distillation, enhancing hybrid retrieval efficiency and generalization.
Findings
LITE reduces memory by 13X compared to traditional hybrid retrievers.
Hybrid-LITE maintains 98% of BM25-DPR performance.
Light hybrid retrievers outperform individual sparse or dense retrievers on out-of-domain and adversarial datasets.
Abstract
Hybrid retrievers can take advantage of both sparse and dense retrievers. Previous hybrid retrievers leverage indexing-heavy dense retrievers. In this work, we study "Is it possible to reduce the indexing memory of hybrid retrievers without sacrificing performance"? Driven by this question, we leverage an indexing-efficient dense retriever (i.e. DrBoost) and introduce a LITE retriever that further reduces the memory of DrBoost. LITE is jointly trained on contrastive learning and knowledge distillation from DrBoost. Then, we integrate BM25, a sparse retriever, with either LITE or DrBoost to form light hybrid retrievers. Our Hybrid-LITE retriever saves 13X memory while maintaining 98.0% performance of the hybrid retriever of BM25 and DPR. In addition, we study the generalization capacity of our light hybrid retrievers on out-of-domain dataset and a set of adversarial attacks datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies
MethodsKnowledge Distillation · Contrastive Learning
