Writing in the Margins: Better Inference Pattern for Long Context   Retrieval

Melisa Russak; Umar Jamil; Christopher Bryant; Kiran Kamble; Axel; Magnuson; Mateusz Russak; Waseem AlShikh

arXiv:2408.14906·cs.CL·August 28, 2024

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Melisa Russak, Umar Jamil, Christopher Bryant, Kiran Kamble, Axel, Magnuson, Mateusz Russak, Waseem AlShikh

PDF

Open Access 2 Repos

TL;DR

Writing in the Margins (WiM) is a new inference pattern for large language models that improves handling long contexts in retrieval tasks, boosting accuracy and F1 scores with minimal overhead.

Contribution

WiM introduces a segment-wise inference method that enhances long context processing in LLMs without fine-tuning, enabling better reasoning and aggregation performance.

Findings

01

7.5% average accuracy improvement in reasoning tasks

02

Over 30% F1-score increase in aggregation tasks

03

Efficient long context handling with marginal computational overhead

Abstract

In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language Models designed to optimize the handling of long input sequences in retrieval-oriented tasks. This approach leverages the chunked prefill of the key-value cache to perform segment-wise inference, which enables efficient processing of extensive contexts along with the generation and classification of intermediate information ("margins") that guide the model towards specific tasks. This method increases computational overhead marginally while significantly enhancing the performance of off-the-shelf models without the need for fine-tuning. Specifically, we observe that WiM provides an average enhancement of 7.5% in accuracy for reasoning skills (HotpotQA, MultiHop-RAG) and more than a 30.0% increase in the F1-score for aggregation tasks (CWE). Additionally, we show how the proposed pattern…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsLib