Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks

Kai Liu; Zhan Su; Peijie Dong; Fengran Mo; Jianfei Gao; ShaoTing Zhang; Kai Chen

arXiv:2507.19353·cs.CL·July 28, 2025

Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks

Kai Liu, Zhan Su, Peijie Dong, Fengran Mo, Jianfei Gao, ShaoTing Zhang, Kai Chen

PDF

Open Access

TL;DR

This paper introduces Smooth Reading, a chunk-wise inference method for Recurrent LLMs that improves their performance on long-context tasks, narrowing the gap with Self-Attention LLMs while maintaining efficiency.

Contribution

It proposes a novel chunk-wise inference approach inspired by human reading, enabling Recurrent LLMs to perform comparably to Self-Attention LLMs on long-context tasks.

Findings

01

Recurrent LLMs with Smooth Reading outperform previous methods on LongBench.

02

The method reduces performance gap from 5.68% below to 3.61% above Self-Attention LLMs.

03

Achieves 3x faster training and 2x faster inference at 64k context length.

Abstract

Recently, recurrent large language models (Recurrent LLMs) with linear computational complexity have re-emerged as efficient alternatives to self-attention-based LLMs (Self-Attention LLMs), which have quadratic complexity. However, Recurrent LLMs often underperform on long-context tasks due to their limited fixed-size memory. Previous research has primarily focused on enhancing the memory capacity of Recurrent LLMs through architectural innovations, but these approaches have not yet enabled Recurrent LLMs to match the performance of Self-Attention LLMs on long-context tasks. We argue that this limitation arises because processing the entire context at once is not well-suited for Recurrent LLMs. In this paper, we propose Smooth Reading, a chunk-wise inference method inspired by human reading strategies. Smooth Reading processes context in chunks and iteratively summarizes the contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification