LLMs as Sparse Retrievers:A Framework for First-Stage Product Search
Hongru Song, Yu-an Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Sen Li, Wenjun Peng, Fuyu Lv, Xueqi Cheng

TL;DR
This paper introduces PROSPER, a novel framework that uses large language models as sparse retrievers for product search, addressing vocabulary mismatch and hallucination issues to improve retrieval quality and online revenue.
Contribution
PROSPER integrates literal residual networks and lexical focusing to enhance sparse retrieval with LLMs, overcoming hallucination and training challenges in product search.
Findings
PROSPER outperforms traditional sparse retrieval baselines.
Achieves recall comparable to dense retrievers.
Online experiments show revenue improvements.
Abstract
Product search is a crucial component of modern e-commerce platforms, with billions of user queries every day. In product search systems, first-stage retrieval should achieve high recall while ensuring efficient online deployment. Sparse retrieval is particularly attractive in this context due to its interpretability and storage efficiency. However, sparse retrieval methods suffer from severe vocabulary mismatch issues, leading to suboptimal performance in product search scenarios. With their potential for semantic analysis, large language models (LLMs) offer a promising avenue for mitigating vocabulary mismatch issues and thereby improving retrieval quality. Directly applying LLMs to sparse retrieval in product search exposes two key challenges:(1)Queries and product titles are typically short and highly susceptible to LLM-induced hallucinations, such as generating irrelevant expansion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Text and Document Classification Technologies
