LLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction   From Large Contexts

Yuri Facanha Bezerra; Li Weigang

arXiv:2501.05554·cs.CL·January 13, 2025

LLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction From Large Contexts

Yuri Facanha Bezerra, Li Weigang

PDF

Open Access 1 Repo

TL;DR

LLMQuoter is a lightweight, distillation-based model that improves retrieval-augmented generation by efficiently extracting relevant quotes from large contexts, leading to significant accuracy gains in reasoning tasks.

Contribution

The paper introduces LLMQuoter, a novel quote extraction model that enhances RAG performance using distillation and a quote-first approach, reducing computational overhead.

Findings

01

Over 20-point accuracy improvements over full-context methods

02

Effective knowledge distillation from high-performing teachers

03

Resource-efficient fine-tuning with competitive results

Abstract

We introduce LLMQuoter, a lightweight, distillation-based model designed to enhance Retrieval Augmented Generation (RAG) by extracting the most relevant textual evidence for downstream reasoning tasks. Built on the LLaMA-3B architecture and fine-tuned with Low-Rank Adaptation (LoRA) on a 15,000-sample subset of HotpotQA, LLMQuoter adopts a "quote-first-then-answer" strategy, efficiently identifying key quotes before passing curated snippets to reasoning models. This workflow reduces cognitive overhead and outperforms full-context approaches like Retrieval-Augmented Fine-Tuning (RAFT), achieving over 20-point accuracy gains across both small and large language models. By leveraging knowledge distillation from a high-performing teacher model, LLMQuoter achieves competitive results in a resource-efficient fine-tuning setup. It democratizes advanced RAG capabilities, delivering significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yurifacanha/llmquoter
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Packet Processing and Optimization

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout