Document Expansion by Query Prediction
Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho

TL;DR
This paper introduces a method that predicts potential user queries for documents and expands the documents with these queries to improve retrieval effectiveness, achieving state-of-the-art results efficiently.
Contribution
It presents a novel query prediction-based document expansion technique using sequence-to-sequence models for improved retrieval performance.
Findings
Achieves state-of-the-art results in two retrieval tasks.
Retrieval alone approaches the effectiveness of neural re-rankers in latency-critical settings.
Method is faster and maintains high effectiveness without re-ranking.
Abstract
One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content.From the perspective of a question answering system, this might comprise questions the document can potentially answer. Following this observation, we propose a simple method that predicts which queries will be issued for a given document and then expands it with those predictions with a vanilla sequence-to-sequence model, trained using datasets consisting of pairs of query and relevant documents. By combining our method with a highly-effective re-ranking component, we achieve the state of the art in two retrieval tasks. In a latency-critical regime, retrieval results alone (without re-ranking) approach the effectiveness of more computationally expensive neural re-rankers but are much faster.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗doc2query/S2ORC-t5-base-v1model· 4 dl· ♡ 44 dl♡ 4
- 🤗doc2query/all-t5-base-v1model· 193 dl· ♡ 10193 dl♡ 10
- 🤗doc2query/all-with_prefix-t5-base-v1model· 1.9k dl· ♡ 101.9k dl♡ 10
- 🤗doc2query/msmarco-t5-base-v1model· 314 dl· ♡ 6314 dl♡ 6
- 🤗doc2query/msmarco-t5-small-v1model· 72 dl· ♡ 172 dl♡ 1
- 🤗doc2query/reddit-t5-base-v1model· 5 dl· ♡ 15 dl♡ 1
- 🤗doc2query/reddit-t5-small-v1model· 12 dl12 dl
- 🤗doc2query/stackexchange-t5-base-v1model· 4 dl4 dl
- 🤗doc2query/stackexchange-title-body-t5-base-v1model· 19 dl19 dl
- 🤗doc2query/stackexchange-title-body-t5-small-v1model· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques
