Learning to Skip the Middle Layers of Transformers

Tim Lawson; Laurence Aitchison

arXiv:2506.21103·cs.LG·June 27, 2025

Learning to Skip the Middle Layers of Transformers

Tim Lawson, Laurence Aitchison

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces a dynamic skipping mechanism for middle layers in Transformers, aiming to improve efficiency by selectively bypassing redundant layers based on input, guided by interpretability insights.

Contribution

It proposes a novel architecture that dynamically skips middle layers in Transformers using learned gating and attention mechanisms, inspired by redundancy and information aggregation insights.

Findings

01

No significant improvement in validation cross-entropy vs. FLOPs trade-off.

02

The approach effectively reduces computation for simpler tokens.

03

Code released for reproducibility and further research.

Abstract

Conditional computation is a popular strategy to make Transformers more efficient. Existing methods often target individual modules (e.g., mixture-of-experts layers) or skip layers independently of one another. However, interpretability research has demonstrated that the middle layers of Transformers exhibit greater redundancy, and that early layers aggregate information into token positions. Guided by these insights, we propose a novel architecture that dynamically skips a variable number of layers from the middle outward. In particular, a learned gating mechanism determines whether to bypass a symmetric span of central blocks based on the input, and a gated attention mechanism prevents subsequent tokens from attending to skipped token positions. Residual norms are controlled with a 'sandwich' or 'perilayernorm' scheme and gate sparsity with an adaptive regularization loss. We had…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tim-lawson/skip-middle
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArt, Technology, and Culture