On the Value of Behavioral Representations for Dense Retrieval
Nan Jiang, Dhivya Eswaran, Choon Hui Teo, Yexiang Xue, Yesh, Dattatreya, Sujay Sanghavi, Vishy Vishwanathan

TL;DR
This paper introduces MVG, a method that augments semantic document representations with behavioral data, improving dense retrieval performance in skewed, real-world e-commerce search settings with minimal additional memory.
Contribution
The paper proposes MVG, a novel approach that clusters user queries to enhance document representations for dense retrieval, addressing limitations of existing semantic-only methods.
Findings
MVG significantly improves retrieval metrics across multiple datasets.
The approach incurs only marginal memory overhead.
Experimental results outperform existing bi-encoder models.
Abstract
We consider text retrieval within dense representational space in real-world settings such as e-commerce search where (a) document popularity and (b) diversity of queries associated with a document have a skewed distribution. Most of the contemporary dense retrieval literature presents two shortcomings in these settings. (1) They learn an almost equal number of representations per document, agnostic to the fact that a few head documents are disproportionately more critical to achieving a good retrieval performance. (ii) They learn purely semantic document representations inferred from intrinsic document characteristics which may not contain adequate information to determine the queries for which the document is relevant--especially when the document is short. We propose to overcome these limitations by augmenting semantic document representations learned by bi-encoders with behavioral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Image and Video Retrieval Techniques
MethodsBalanced Selection
