Intra-Layer Recurrence in Transformers for Language Modeling
Anthony Nguyen, Wenjun Lin

TL;DR
This paper introduces Intra-Layer Recurrence (ILR), a targeted method for applying recurrence within individual transformer layers, improving efficiency and performance in language modeling.
Contribution
The paper proposes ILR, a novel approach that selectively applies recurrence to specific layers, optimizing transformer models without increasing overall depth.
Findings
Allocating more recurrence to early layers improves performance.
ILR enhances transformer efficiency and effectiveness.
Targeted recurrence outperforms uniform recurrence strategies.
Abstract
Transformer models have established new benchmarks in natural language processing; however, their increasing depth results in substantial growth in parameter counts. While existing recurrent transformer methods address this issue by reprocessing layers multiple times, they often apply recurrence indiscriminately across entire blocks of layers. In this work, we investigate Intra-Layer Recurrence (ILR), a more targeted approach that applies recurrence selectively to individual layers within a single forward pass. Our experiments show that allocating more iterations to earlier layers yields optimal results. These findings suggest that ILR offers a promising direction for optimizing recurrent structures in transformer architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
