TL;DR
PROP introduces a novel pre-training objective for ad-hoc retrieval that predicts representative words, inspired by classical IR models, leading to improved retrieval performance especially in low-resource scenarios.
Contribution
The paper proposes PROP, a new pre-training method for IR that focuses on representative words prediction, enhancing retrieval effectiveness over existing methods.
Findings
PROP outperforms baselines in various ad-hoc retrieval tasks.
PROP achieves strong results in zero- and low-resource IR settings.
Pre-trained models and code are publicly available.
Abstract
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR). However, pre-training objectives tailored for ad-hoc retrieval have not been well explored. In this paper, we propose Pre-training with Representative wOrds Prediction (PROP) for ad-hoc retrieval. PROP is inspired by the classical statistical language model for IR, specifically the query likelihood model, which assumes that the query is generated as the piece of text representative of the "ideal" document. Based on this idea, we construct the representative words prediction (ROP) task for pre-training. Given an input document, we sample a pair of word sets according to the document language model, where the set with higher likelihood is deemed as more representative of the document. We then pre-train the Transformer model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dense Connections · Multi-Head Attention · Label Smoothing · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay
