Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation
Ji Ma, Ivan Korotkov, Yinfei Yang, Keith Hall, Ryan McDonald

TL;DR
This paper introduces a zero-shot passage retrieval method using synthetic question generation trained on general data but applied to specific domains, enabling effective neural retrieval without large labeled datasets.
Contribution
It presents a novel zero-shot learning approach for passage retrieval using domain-targeted synthetic questions, improving performance without requiring large supervised training sets.
Findings
Effective in domain-specific passage retrieval
Approaches supervised model accuracy in some domains
Enhances first-stage retrieval with hybrid models
Abstract
A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on general domain data, but is applied to documents in the targeted domain. This allows us to create arbitrarily large, yet noisy, question-passage relevance pairs that are domain specific. Furthermore, when this is coupled with a simple hybrid term-neural model, first-stage retrieval performance can be improved further. Empirically, we show that this is an effective strategy for building neural passage retrieval models in the absence of large training corpora. Depending on the domain, this technique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems
