Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models
Zhiyuan Peng, Xuyang Wu, Qifan Wang, Yi Fang

TL;DR
This paper introduces a novel soft prompt tuning method to enhance dense retrieval models by generating high-quality weak queries with large language models, especially useful when domain-specific data is scarce.
Contribution
It proposes a new soft prompt tuning approach for augmenting dense retrieval, improving zero-shot and few-shot performance without relying on human-crafted prompts.
Findings
SPTAR outperforms BM25 and LLM-based augmentation methods.
Soft prompt tuning effectively generates high-quality weak queries.
The method improves dense retrieval in low-data scenarios.
Abstract
Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
