TL;DR
This paper investigates methods to enhance SPLADE model efficiency, reducing latency and computational costs while maintaining competitive retrieval performance, and introduces techniques to match traditional BM25 latency under similar constraints.
Contribution
The paper proposes novel techniques including L1 regularization, encoder separation, FLOPS-regularized middle-training, and faster encoders to significantly improve SPLADE efficiency and performance.
Findings
Achieved similar latency to BM25 under the same computing constraints.
Increased in-domain retrieval performance metrics.
First neural models to match traditional retrieval latency with minimal performance loss.
Abstract
Latency and efficiency issues are often overlooked when evaluating IR models based on Pretrained Language Models (PLMs) in reason of multiple hardware and software testing scenarios. Nevertheless, efficiency is an important part of such systems and should not be overlooked. In this paper, we focus on improving the efficiency of the SPLADE model since it has achieved state-of-the-art zero-shot performance and competitive results on TREC collections. SPLADE efficiency can be controlled via a regularization factor, but solely controlling this regularization has been shown to not be efficient enough. In order to reduce the latency gap between SPLADE and traditional retrieval systems, we propose several techniques including L1 regularization for queries, a separation of document/query encoders, a FLOPS-regularized middle-training, and the use of faster query encoders. Our benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsL1 Regularization
