CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search
Ao Xie, Jiahui Chen, Quanzhi Zhu, Xiaoze Jiang, Zhiheng Qin, Enyun Yu, Han Li

TL;DR
CroPS introduces a novel training approach for dense retrieval in short-video search by incorporating diverse positive samples from multiple perspectives, reducing bias and improving retrieval performance.
Contribution
The paper proposes CroPS, a data engine that leverages multi-perspective positive samples and a hierarchical label strategy to enhance dense retrieval models.
Findings
Significantly improves retrieval accuracy on Kuaishou Search
Reduces query reformulation rates in live deployment
Achieves superior offline and online performance
Abstract
Dense retrieval has become a foundational paradigm in modern search systems, especially on short-video platforms. However, most industrial systems adopt a self-reinforcing training pipeline that relies on historically exposed user interactions for supervision. This paradigm inevitably leads to a filter bubble effect, where potentially relevant but previously unseen content is excluded from the training signal, biasing the model toward narrow and conservative retrieval. In this paper, we present CroPS (Cross-Perspective Positive Samples), a novel retrieval data engine designed to alleviate this problem by introducing diverse and semantically meaningful positive examples from multiple perspectives. CroPS enhances training with positive signals derived from user query reformulation behavior (query-level), engagement data in recommendation streams (system-level), and world knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
