Augmented Embeddings for Custom Retrievals

Anirudh Khatry; Yasharth Bajpai; Priyanshu Gupta; Sumit Gulwani,; Ashish Tiwari

arXiv:2310.05380·cs.IR·October 10, 2023

Augmented Embeddings for Custom Retrievals

Anirudh Khatry, Yasharth Bajpai, Priyanshu Gupta, Sumit Gulwani,, Ashish Tiwari

PDF

Open Access

TL;DR

This paper introduces Adapted Dense Retrieval, a method that fine-tunes pretrained embeddings with a low-rank residual to improve task-specific, heterogeneous, and strict retrieval performance, especially for small Top-K values.

Contribution

It proposes a novel low-rank residual adaptation technique to enhance pretrained embeddings for specialized heterogeneous retrieval tasks.

Findings

01

Significant improvements over baseline dense retrieval methods.

02

Effective in strict, small Top-K retrieval scenarios.

03

Applicable to heterogeneous artifact retrieval tasks.

Abstract

Information retrieval involves selecting artifacts from a corpus that are most relevant to a given search query. The flavor of retrieval typically used in classical applications can be termed as homogeneous and relaxed, where queries and corpus elements are both natural language (NL) utterances (homogeneous) and the goal is to pick most relevant elements from the corpus in the Top-K, where K is large, such as 10, 25, 50 or even 100 (relaxed). Recently, retrieval is being used extensively in preparing prompts for large language models (LLMs) to enable LLMs to perform targeted tasks. These new applications of retrieval are often heterogeneous and strict -- the queries and the corpus contain different kinds of entities, such as NL and code, and there is a need for improving retrieval at Top-K for small values of K, such as K=1 or 3 or 5. Current dense retrieval techniques based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Text and Document Classification Technologies