Long Context RAG Performance of Large Language Models
Quinn Leng, Jacob Portes, Sam Havens, Matei Zaharia, Michael Carbin

TL;DR
This study investigates how increasing context length in Large Language Models affects Retrieval Augmented Generation performance, revealing benefits, limitations, and failure modes across various models and datasets.
Contribution
It provides a comprehensive analysis of long context effects on RAG performance, highlighting the capabilities and challenges of recent LLMs at extended context lengths.
Findings
Retrieving more documents can enhance RAG performance.
Most state-of-the-art LLMs struggle to maintain accuracy above 64k tokens.
Distinct failure modes emerge in long context scenarios.
Abstract
Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs) by incorporating external information. With the advent of LLMs that support increasingly longer context lengths, there is a growing interest in understanding how these models perform in RAG scenarios. Can these new long context models improve RAG performance? This paper presents a comprehensive study of the impact of increased context length on RAG performance across 20 popular open source and commercial LLMs. We ran RAG workflows while varying the total context length from 2,000 to 128,000 tokens (and 2 million tokens when possible) on three domain-specific datasets, and report key insights on the benefits and limitations of long context in RAG applications. Our findings reveal that while retrieving more documents can improve performance, only a handful of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsLinear Layer · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · WordPiece · Adam · Attention Is All You Need
