Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

Xiaoju Ye; Zhichun Wang; Jingyuan Wang

arXiv:2502.12962·cs.CL·February 19, 2025

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

Xiaoju Ye, Zhichun Wang, Jingyuan Wang

PDF

Open Access 2 Repos

TL;DR

This paper introduces InfiniRetri, a novel method leveraging LLMs' attention to enable accurate retrieval in infinitely long contexts, significantly improving performance and efficiency without additional training.

Contribution

It proposes a new attention-based retrieval method for LLMs that handles infinite-length inputs, outperforming existing techniques and reducing computational costs.

Findings

01

Achieves 100% accuracy on NIH test with 1M tokens using a 0.5B model.

02

Surpasses other methods and larger models in benchmarks.

03

Reduces inference latency and compute overhead.

Abstract

Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvement in realistic tasks. Our work observes the correlation between the attention distribution and generated answers across each layer, and establishes the attention allocation aligns with retrieval-augmented capabilities through experiments. Drawing on the above insights, we propose a novel method InfiniRetri that leverages the LLMs's own attention information to enable accurate retrieval across inputs of infinitely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Topic Modeling · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need