Intra-Layer Recurrence in Transformers for Language Modeling

Anthony Nguyen; Wenjun Lin

arXiv:2505.01855·cs.CL·May 27, 2025

Intra-Layer Recurrence in Transformers for Language Modeling

Anthony Nguyen, Wenjun Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces Intra-Layer Recurrence (ILR), a targeted method for applying recurrence within individual transformer layers, improving efficiency and performance in language modeling.

Contribution

The paper proposes ILR, a novel approach that selectively applies recurrence to specific layers, optimizing transformer models without increasing overall depth.

Findings

01

Allocating more recurrence to early layers improves performance.

02

ILR enhances transformer efficiency and effectiveness.

03

Targeted recurrence outperforms uniform recurrence strategies.

Abstract

Transformer models have established new benchmarks in natural language processing; however, their increasing depth results in substantial growth in parameter counts. While existing recurrent transformer methods address this issue by reprocessing layers multiple times, they often apply recurrence indiscriminately across entire blocks of layers. In this work, we investigate Intra-Layer Recurrence (ILR), a more targeted approach that applies recurrence selectively to individual layers within a single forward pass. Our experiments show that allocating more iterations to earlier layers yields optimal results. These findings suggest that ILR offers a promising direction for optimizing recurrent structures in transformer architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ant-8/layer-recurrent-transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques