LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language   Models and Doc-Level Embedding

Mingrui Wu; Sheng Cao

arXiv:2404.05825·cs.IR·April 10, 2024·1 cites

LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

Mingrui Wu, Sheng Cao

PDF

Open Access

TL;DR

This paper presents a flexible framework that enhances retrieval models using large language models and improved training techniques, leading to state-of-the-art results on multiple datasets.

Contribution

It introduces a model-agnostic doc-level embedding framework with LLM augmentation and improved training components for retrieval models.

Findings

01

Significant performance improvements on LoTTE and BEIR datasets.

02

Enhanced effectiveness of Bi-encoder and late-interaction retrieval models.

03

Achieved state-of-the-art retrieval results.

Abstract

Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-of-words based approaches. This paper introduces a model-agnostic doc-level embedding framework through large language model (LLM) augmentation. In addition, it also improves some important components in the retrieval model training process, such as negative sampling, loss function, etc. By implementing this LLM-augmented retrieval framework, we have been able to significantly improve the effectiveness of widely-used retriever models such as Bi-encoders (Contriever, DRAGON) and late-interaction models (ColBERTv2), thereby achieving state-of-the-art results on LoTTE datasets and BEIR datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling