Does RAG Really Perform Bad For Long-Context Processing?

Kun Luo; Zheng Liu; Peitian Zhang; Hongjin Qian; Jun Zhao; and Kang Liu

arXiv:2502.11444·cs.CL·February 18, 2025

Does RAG Really Perform Bad For Long-Context Processing?

Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, Jun Zhao, and Kang Liu

PDF

Open Access

TL;DR

This paper introduces RetroLM, a novel retrieval-augmented generation framework that improves long-context processing in large language models by enhancing retrieval accuracy, robustness, and efficiency, especially for tasks demanding extensive reasoning.

Contribution

RetroLM employs KV-level retrieval augmentation and specialized retrievers, advancing long-context LLM performance and robustness over existing methods.

Findings

01

RetroLM significantly outperforms existing long-context LLMs.

02

It demonstrates superior reasoning and comprehension on long-context benchmarks.

03

The framework reduces computational costs through efficient retrieval and fragmented context utilization.

Abstract

The efficient processing of long context poses a serious challenge for large language models (LLMs). Recently, retrieval-augmented generation (RAG) has emerged as a promising strategy for this problem, as it enables LLMs to make selective use of the long context for efficient computation. However, existing RAG approaches lag behind other long-context processing methods due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these challenges, we introduce RetroLM, a novel RAG framework for long-context processing. Unlike traditional methods, RetroLM employs KV-level retrieval augmentation, where it partitions the LLM's KV cache into contiguous pages and retrieves the most crucial ones for efficient computation. This approach enhances robustness to retrieval inaccuracy, facilitates effective utilization of fragmented contexts, and saves the cost from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Topic Modeling · Machine Learning in Healthcare

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Adam · Softmax · Dropout · Weight Decay · BART · WordPiece · Layer Normalization