Efficiency and Effectiveness of SPLADE Models on Billion-Scale Web Document Title
Taeryun Won, Tae Kwan Lee, Hiun Kim, Hyemin Lee

TL;DR
This paper compares SPLADE-based models with traditional methods for large-scale web document retrieval, demonstrating that with pruning strategies, SPLADE models can achieve a good balance of effectiveness and efficiency on billion-scale datasets.
Contribution
It introduces pruning techniques to improve SPLADE models' efficiency, enabling scalable deployment without significant loss in retrieval performance.
Findings
SPLADE models outperform BM25 on complex queries.
Pruning strategies reduce computational costs significantly.
Expanded-SPLADE balances effectiveness and efficiency best.
Abstract
This paper presents a comprehensive comparison of BM25, SPLADE, and Expanded-SPLADE models in the context of large-scale web document retrieval. We evaluate the effectiveness and efficiency of these models on datasets spanning from tens of millions to billions of web document titles. SPLADE and Expanded-SPLADE, which utilize sparse lexical representations, demonstrate superior retrieval performance compared to BM25, especially for complex queries. However, these models incur higher computational costs. We introduce pruning strategies, including document-centric pruning and top-k query term selection, boolean query with term threshold to mitigate these costs and improve the models' efficiency without significantly sacrificing retrieval performance. The results show that Expanded-SPLADE strikes the best balance between effectiveness and efficiency, particularly when handling large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Expert finding and Q&A systems · Web Data Mining and Analysis
