KuaiSearch: A Large-Scale E-Commerce Search Dataset for Recall, Ranking, and Relevance
Yupeng Li, Ben Chen, Mingyue Cheng, Zhiding Liu, Xuxin Zhang, Chenyi Lei, Wenwu Ou

TL;DR
KuaiSearch is the largest real-world e-commerce search dataset, capturing authentic user interactions across multiple search stages, enabling advanced research with large language models in realistic scenarios.
Contribution
It introduces KuaiSearch, a comprehensive, large-scale dataset from Kuaishou that includes real user queries and product texts across recall, ranking, and relevance stages.
Findings
Provides authentic user search data covering cold-start and long-tail products.
Establishes benchmark experiments for multiple search tasks.
Demonstrates the dataset's value for LLM-based e-commerce search research.
Abstract
E-commerce search serves as a central interface, connecting user demands with massive product inventories and plays a vital role in our daily lives. However, in real-world applications, it faces challenges, including highly ambiguous queries, noisy product texts with weak semantic order, and diverse user preferences, all of which make it difficult to accurately capture user intent and fine-grained product semantics. In recent years, significant advances in large language models (LLMs) for semantic representation and contextual reasoning have created new opportunities to address these challenges. Nevertheless, existing e-commerce search datasets still suffer from notable limitations: queries are often heuristically constructed, cold-start users and long-tail products are filtered out, query and product texts are anonymized, and most datasets cover only a single stage of the search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Text and Document Classification Technologies · Recommender Systems and Techniques
