Hyperloop Transformers

Abbas Zeitoun; Lucas Torroba-Hennigen; Yoon Kim

arXiv:2604.21254·cs.LG·April 28, 2026

Hyperloop Transformers

Abbas Zeitoun, Lucas Torroba-Hennigen, Yoon Kim

PDF

1 Models

TL;DR

This paper introduces Hyperloop Transformers, a parameter-efficient architecture using looped Transformers and hyper-connections, outperforming traditional models in memory-constrained language modeling tasks.

Contribution

The paper proposes a novel Hyperloop Transformer architecture that reuses layers and adds hyper-connections, reducing parameters while maintaining or improving performance.

Findings

01

Hyperloop Transformers outperform depth-matched baselines with 50% fewer parameters.

02

Performance persists after weight quantization, indicating robustness.

03

Hyperloop Transformers are suitable for memory-efficient language modeling.

Abstract

LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further constrained by the model's memory footprint, thus motivating parameter-efficient architectures for language modeling. This paper describes a simple architecture that improves the parameter-efficiency of LLMs. Our architecture makes use of looped Transformers as a core primitive, which reuse Transformer layers across depth and are thus more parameter-efficient than ordinary (depth-matched) Transformers. We organize the looped Transformer into three blocks--begin, middle, and end blocks--where each block itself consists of multiple Transformer layers, and only the middle block is applied recurrently across depth. We augment the looped middle block with hyper-connections (Xie et al., 2026),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
galimova/muon-looped-experiments
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.