Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval
Xiaojie Sun, Keping Bi, Jiafeng Guo, Xinyu Ma, Fan Yixing, Hongyu, Shan, Qishen Zhang, Zhongyi Liu

TL;DR
This paper introduces a pre-training method for multi-aspect dense retrieval that leverages aspect content as text strings and employs mutual prediction objectives, improving relevance matching in product search scenarios.
Contribution
It proposes a novel pre-training approach using aspect content as text and mutual prediction objectives, capturing semantic similarities better than traditional class ID methods.
Findings
Outperforms baselines on real-world datasets
Effectively captures semantic similarities of aspect content
Enhances multi-aspect dense retrieval performance
Abstract
Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging aspect information for multi-aspect retrieval is to introduce an auxiliary classification objective, i.e., using item contents to predict the annotated value IDs of item aspects. However, by learning the value embeddings from scratch, this approach may not capture the various semantic similarities between the values sufficiently. To address this limitation, we leverage the aspect information as text strings rather than class IDs during pre-training so that their semantic similarities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
