RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

Xiuying Wei; Anunay Yadav; Razvan Pascanu; Caglar Gulcehre

arXiv:2507.04416·cs.CL·November 20, 2025

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling

Xiuying Wei, Anunay Yadav, Razvan Pascanu, Caglar Gulcehre

PDF

Open Access 1 Models 1 Video

TL;DR

RAT introduces a chunk-based sequence model that combines RNN efficiency with attention accuracy, enabling faster training and inference for long sequences while maintaining performance.

Contribution

The paper proposes RAT, a novel hybrid model that bridges RNN efficiency and attention capacity through chunk-based processing, improving long-sequence modeling.

Findings

01

7x faster training for 100K sequences

02

9x faster generation at 4K position

03

Maintains performance comparable to standard attention

Abstract

Transformers have become the cornerstone of modern large-scale language models, but their reliance on softmax attention poses a computational bottleneck at both training and inference. Recurrent models offer high efficiency, but compressing the full sequence into a fixed-size and holistic representation can suffer from memory degradation in long contexts and limit fine-grained retrieval. To address this, we propose RAT, an intermediate design that bridges the efficiency of RNNs and capacity of attention. RAT partitions the input into chunks, applies recurrence within each chunk for local dependencies, and softmax-based attention across chunks for long-range interactions. This design mitigates memory degradation and enables direct access to distant tokens, while retaining computational efficiency. Empirically, with a chunk size of 16, the RAT block achieves a 7 $\times$ improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
barpitf/RAT
model· ♡ 2
♡ 2

Videos

RAT: Bridging RNN Efficiency and Attention Accuracy via Chunk-based Sequence Modeling· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Neural Network Applications · Topic Modeling