UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and   Distillation of Rerankers

Jon Saad-Falcon; Omar Khattab; Keshav Santhanam; Radu Florian; Martin; Franz; Salim Roukos; Avirup Sil; Md Arafat Sultan; Christopher Potts

arXiv:2303.00807·cs.IR·October 16, 2023·1 cites

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin, Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts

PDF

Open Access 1 Repo

TL;DR

This paper introduces UDAPDR, a method that leverages large language models to generate synthetic queries for domain adaptation in information retrieval, improving zero-shot accuracy and reducing latency.

Contribution

It proposes a novel approach combining LLM prompting and distillation of rerankers to enhance domain adaptation without labeled data.

Findings

01

Boosts zero-shot accuracy in long-tail domains

02

Achieves lower latency than standard reranking methods

03

Effective in domain adaptation for IR tasks

Abstract

Many information retrieval tasks require large labeled datasets for fine-tuning. However, such datasets are often unavailable, and their utility for real-world applications can diminish quickly due to domain shifts. To address this challenge, we develop and motivate a method for using large language models (LLMs) to generate large numbers of synthetic queries cheaply. The method begins by generating a small number of synthetic queries using an expensive LLM. After that, a much less expensive one is used to create large numbers of synthetic queries, which are used to fine-tune a family of reranker models. These rerankers are then distilled into a single efficient retriever for use in the target domain. We show that this technique boosts zero-shot accuracy in long-tail domains and achieves substantially lower latency than standard reranking methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

primeqa/primeqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques