Enhancing Long Context Performance in LLMs Through Inner Loop Query   Mechanism

Yimin Tang; Yurong Xu; Ning Yan; Masood Mortazavi

arXiv:2410.12859·cs.CL·October 18, 2024

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

Yimin Tang, Yurong Xu, Ning Yan, Masood Mortazavi

PDF

Open Access

TL;DR

This paper introduces ILM-TR, a novel retrieval method that iteratively refines context understanding in LLMs by using inner-loop queries and short-term memory, significantly improving performance on long-context tasks.

Contribution

The paper proposes ILM-TR, a new retrieval mechanism with inner-loop queries and short-term memory, enhancing long context handling in LLMs beyond traditional RAG methods.

Findings

01

ILM-TR improves long-context task performance.

02

Retrieval with STM outperforms traditional RAG methods.

03

Effective in complex reasoning scenarios.

Abstract

Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG) besed models can better handle longer contexts by using a retrieval system to filter out unnecessary information. However, most RAG methods only perform retrieval based on the initial query, which may not work well with complex questions that require deeper reasoning. We introduce a novel approach, Inner Loop Memory Augmented Tree Retrieval (ILM-TR), involving inner-loop queries, based not only on the query question itself but also on intermediate findings. At inference time, our model retrieves information from the RAG system, integrating data from lengthy documents at various levels of abstraction. Based on the information retrieved, the LLM generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Service-Oriented Architecture and Web Services

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Byte Pair Encoding · Softmax · Multi-Head Attention · WordPiece · Dropout · Layer Normalization · Adam · Attention Dropout