Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive   Study and Hybrid Approach

Zhuowan Li; Cheng Li; Mingyang Zhang; Qiaozhu Mei; Michael Bendersky

arXiv:2407.16833·cs.CL·October 18, 2024

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Zhuowan Li, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

PDF

TL;DR

This paper compares Retrieval Augmented Generation and long-context LLMs, demonstrating that while long-context models outperform RAG with sufficient resources, a hybrid approach called Self-Route can balance performance and cost effectively.

Contribution

The paper provides a comprehensive comparison of RAG and long-context LLMs and introduces Self-Route, a hybrid method that optimally routes queries to reduce costs while maintaining performance.

Findings

01

Long-context LLMs outperform RAG with sufficient resources.

02

RAG offers significantly lower computational cost.

03

Self-Route effectively balances performance and cost.

Abstract

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that when resourced sufficiently, LC consistently outperforms RAG in terms of average performance. However, RAG's significantly lower cost remains a distinct advantage. Based on this observation, we propose Self-Route, a simple yet effective method that routes queries to RAG or LC based on model self-reflection. Self-Route significantly reduces the computation cost while maintaining a comparable performance to LC. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Weight Decay · WordPiece · Attention Dropout · Adam · Label Smoothing · Linear Layer · Byte Pair Encoding