Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

TL;DR
This paper compares Retrieval Augmented Generation and long-context LLMs, demonstrating that while long-context models outperform RAG with sufficient resources, a hybrid approach called Self-Route can balance performance and cost effectively.
Contribution
The paper provides a comprehensive comparison of RAG and long-context LLMs and introduces Self-Route, a hybrid method that optimally routes queries to reduce costs while maintaining performance.
Findings
Long-context LLMs outperform RAG with sufficient resources.
RAG offers significantly lower computational cost.
Self-Route effectively balances performance and cost.
Abstract
Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. Self-Route significantly reduces the computation cost while maintaining a comparable performance to LC. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · WordPiece · Attention Dropout · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding
