Overflow Prevention Enhances Long-Context Recurrent LLMs

Assaf Ben-Kish; Itamar Zimerman; M. Jehanzeb Mirza; Lior Wolf; James Glass; Leonid Karlinsky; Raja Giryes

arXiv:2505.07793·cs.LG·September 10, 2025

Overflow Prevention Enhances Long-Context Recurrent LLMs

Assaf Ben-Kish, Itamar Zimerman, M. Jehanzeb Mirza, Lior Wolf, James Glass, Leonid Karlinsky, Raja Giryes

PDF

Open Access 1 Repo

TL;DR

This paper shows that a simple chunk-based inference method significantly improves long-context processing in recurrent LLMs, achieving state-of-the-art results and questioning their use of long-range dependencies.

Contribution

The paper introduces a chunk-based inference approach that enhances long-context performance in recurrent LLMs, outperforming existing methods and challenging assumptions about their long-range dependency modeling.

Findings

01

Chunk-based inference improves model performance by up to 51%.

02

Recurrent models perform better with simple chunking, questioning their long-range dependency use.

03

State-of-the-art results achieved on LongBench v2 with this method.

Abstract

A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

assafbk/OPRM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Security and Verification in Computing · Topic Modeling