Overflow Prevention Enhances Long-Context Recurrent LLMs
Assaf Ben-Kish, Itamar Zimerman, M. Jehanzeb Mirza, Lior Wolf, James Glass, Leonid Karlinsky, Raja Giryes

TL;DR
This paper shows that a simple chunk-based inference method significantly improves long-context processing in recurrent LLMs, achieving state-of-the-art results and questioning their use of long-range dependencies.
Contribution
The paper introduces a chunk-based inference approach that enhances long-context performance in recurrent LLMs, outperforming existing methods and challenging assumptions about their long-range dependency modeling.
Findings
Chunk-based inference improves model performance by up to 51%.
Recurrent models perform better with simple chunking, questioning their long-range dependency use.
State-of-the-art results achieved on LongBench v2 with this method.
Abstract
A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Security and Verification in Computing · Topic Modeling
