LLMQuoter: Enhancing RAG Capabilities Through Efficient Quote Extraction From Large Contexts
Yuri Facanha Bezerra, Li Weigang

TL;DR
LLMQuoter is a lightweight, distillation-based model that improves retrieval-augmented generation by efficiently extracting relevant quotes from large contexts, leading to significant accuracy gains in reasoning tasks.
Contribution
The paper introduces LLMQuoter, a novel quote extraction model that enhances RAG performance using distillation and a quote-first approach, reducing computational overhead.
Findings
Over 20-point accuracy improvements over full-context methods
Effective knowledge distillation from high-performing teachers
Resource-efficient fine-tuning with competitive results
Abstract
We introduce LLMQuoter, a lightweight, distillation-based model designed to enhance Retrieval Augmented Generation (RAG) by extracting the most relevant textual evidence for downstream reasoning tasks. Built on the LLaMA-3B architecture and fine-tuned with Low-Rank Adaptation (LoRA) on a 15,000-sample subset of HotpotQA, LLMQuoter adopts a "quote-first-then-answer" strategy, efficiently identifying key quotes before passing curated snippets to reasoning models. This workflow reduces cognitive overhead and outperforms full-context approaches like Retrieval-Augmented Fine-Tuning (RAFT), achieving over 20-point accuracy gains across both small and large language models. By leveraging knowledge distillation from a high-performing teacher model, LLMQuoter achieves competitive results in a resource-efficient fine-tuning setup. It democratizes advanced RAG capabilities, delivering significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Linear Warmup With Linear Decay · WordPiece · Attention Dropout · Adam · Residual Connection · Dropout
