Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism
Yimin Tang, Yurong Xu, Ning Yan, Masood Mortazavi

TL;DR
This paper introduces ILM-TR, a novel retrieval method that iteratively refines context understanding in LLMs by using inner-loop queries and short-term memory, significantly improving performance on long-context tasks.
Contribution
The paper proposes ILM-TR, a new retrieval mechanism with inner-loop queries and short-term memory, enhancing long context handling in LLMs beyond traditional RAG methods.
Findings
ILM-TR improves long-context task performance.
Retrieval with STM outperforms traditional RAG methods.
Effective in complex reasoning scenarios.
Abstract
Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG) besed models can better handle longer contexts by using a retrieval system to filter out unnecessary information. However, most RAG methods only perform retrieval based on the initial query, which may not work well with complex questions that require deeper reasoning. We introduce a novel approach, Inner Loop Memory Augmented Tree Retrieval (ILM-TR), involving inner-loop queries, based not only on the query question itself but also on intermediate findings. At inference time, our model retrieves information from the RAG system, integrating data from lengthy documents at various levels of abstraction. Based on the information retrieved, the LLM generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Service-Oriented Architecture and Web Services
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Byte Pair Encoding · Softmax · Multi-Head Attention · WordPiece · Dropout · Layer Normalization · Adam · Attention Dropout
