Promptagator: Few-shot Dense Retrieval From 8 Examples
Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton, Bakalov, Kelvin Guu, Keith B. Hall, Ming-Wei Chang

TL;DR
This paper introduces Promptagator, a few-shot dense retrieval method using large language models to generate task-specific queries from minimal examples, outperforming traditional models trained on large datasets.
Contribution
The paper presents a novel approach leveraging LLMs for task-specific query generation in few-shot retrieval, eliminating the need for large training datasets.
Findings
Promptagator outperforms models trained on MS MARCO with only 8 examples.
Generated data improves re-ranker performance by 5.0 nDCG points.
Query generation surpasses previous methods in few-shot retrieval settings.
Abstract
Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data. Powered by LLM's generalization ability, Promptagator makes it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
