Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing
Xiaoju Ye, Zhichun Wang, Jingyuan Wang

TL;DR
This paper introduces InfiniRetri, a novel method leveraging LLMs' attention to enable accurate retrieval in infinitely long contexts, significantly improving performance and efficiency without additional training.
Contribution
It proposes a new attention-based retrieval method for LLMs that handles infinite-length inputs, outperforming existing techniques and reducing computational costs.
Findings
Achieves 100% accuracy on NIH test with 1M tokens using a 0.5B model.
Surpasses other methods and larger models in benchmarks.
Reduces inference latency and compute overhead.
Abstract
Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvement in realistic tasks. Our work observes the correlation between the attention distribution and generated answers across each layer, and establishes the attention allocation aligns with retrieval-augmented capabilities through experiments. Drawing on the above insights, we propose a novel method InfiniRetri that leverages the LLMs's own attention information to enable accurate retrieval across inputs of infinitely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Topic Modeling · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need
