Query-as-context Pre-training for Dense Passage Retrieval
Xing Wu, Guangyuan Ma, Wanhui Qian, Zijia Lin, Songlin Hu

TL;DR
This paper introduces query-as-context pre-training for dense passage retrieval, which leverages passage-query pairs derived from the same passage to improve retrieval performance and training efficiency.
Contribution
It proposes a novel pre-training method that uses query-as-context pairs, addressing limitations of previous context-supervised approaches.
Findings
Significant performance improvements on large-scale benchmarks
Effective in out-of-domain zero-shot scenarios
Speeds up training process
Abstract
Recently, methods have been developed to improve the performance of dense passage retrieval by using context-supervised pre-training. These methods simply consider two passages from the same document to be relevant, without taking into account the possibility of weakly correlated pairs. Thus, this paper proposes query-as-context pre-training, a simple yet effective pre-training technique to alleviate the issue. Query-as-context pre-training assumes that the query derived from a passage is more likely to be relevant to that passage and forms a passage-query pair. These passage-query pairs are then used in contrastive or generative context-supervised pre-training. The pre-trained models are evaluated on large-scale passage retrieval benchmarks and out-of-domain zero-shot benchmarks. Experimental results show that query-as-context pre-training brings considerable gains and meanwhile speeds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsContrastive Learning
