LLMs as Sparse Retrievers:A Framework for First-Stage Product Search

Hongru Song; Yu-an Liu; Ruqing Zhang; Jiafeng Guo; Maarten de Rijke; Sen Li; Wenjun Peng; Fuyu Lv; Xueqi Cheng

arXiv:2510.18527·cs.IR·October 23, 2025

LLMs as Sparse Retrievers:A Framework for First-Stage Product Search

Hongru Song, Yu-an Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Sen Li, Wenjun Peng, Fuyu Lv, Xueqi Cheng

PDF

Open Access

TL;DR

This paper introduces PROSPER, a novel framework that uses large language models as sparse retrievers for product search, addressing vocabulary mismatch and hallucination issues to improve retrieval quality and online revenue.

Contribution

PROSPER integrates literal residual networks and lexical focusing to enhance sparse retrieval with LLMs, overcoming hallucination and training challenges in product search.

Findings

01

PROSPER outperforms traditional sparse retrieval baselines.

02

Achieves recall comparable to dense retrievers.

03

Online experiments show revenue improvements.

Abstract

Product search is a crucial component of modern e-commerce platforms, with billions of user queries every day. In product search systems, first-stage retrieval should achieve high recall while ensuring efficient online deployment. Sparse retrieval is particularly attractive in this context due to its interpretability and storage efficiency. However, sparse retrieval methods suffer from severe vocabulary mismatch issues, leading to suboptimal performance in product search scenarios. With their potential for semantic analysis, large language models (LLMs) offer a promising avenue for mitigating vocabulary mismatch issues and thereby improving retrieval quality. Directly applying LLMs to sparse retrieval in product search exposes two key challenges:(1)Queries and product titles are typically short and highly susceptible to LLM-induced hallucinations, such as generating irrelevant expansion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Text and Document Classification Technologies