LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Ahmadreza Jeddi; Marco Ciccone; Babak Taati

arXiv:2602.11451·cs.CL·February 13, 2026

LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation

Ahmadreza Jeddi, Marco Ciccone, Babak Taati

PDF

Open Access 3 Reviews

TL;DR

LoopFormer introduces a flexible, budget-conditioned looped Transformer architecture that adapts its reasoning depth dynamically, improving efficiency and performance in language modeling and reasoning tasks under variable compute constraints.

Contribution

It proposes a shortcut-consistency training scheme for looped Transformers, enabling adaptive reasoning depth across variable-length trajectories.

Findings

01

Robust performance on language modeling benchmarks under compute constraints

02

Graceful scaling with additional computational budget

03

Effective latent reasoning with adaptive loop iterations

Abstract

Looped Transformers have emerged as an efficient and powerful class of models for reasoning in the language domain. Recent studies show that these models achieve strong performance on algorithmic and reasoning tasks, suggesting that looped architectures possess an inductive bias toward latent reasoning. However, prior approaches fix the number of loop iterations during training and inference, leaving open the question of whether these models can flexibly adapt their computational depth under variable compute budgets. We introduce LoopFormer, a looped Transformer trained on variable-length trajectories to enable budget-conditioned reasoning. Our core contribution is a shortcut-consistency training scheme that aligns trajectories of different lengths, ensuring that shorter loops yield informative representations while longer loops continue to refine them. LoopFormer conditions each loop…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

The proposed architecture is reasonably simple and well motivated. The overall direction, transformer-like that can dynamically modulate compute, is important. The results showing the nice monotonicity of perplexity or accuracy vs. FLOPs (Fig 2) is great.

Weaknesses

The experimental results are underwhelming, and maybe I missed the point, but it is unclear to me how this model improves wrt a vanilla transformer. The results are presented in such a way that the training budget is not equalized (if I am correct? I do not understand the first sentence of 4.1), the inference flops are, and the parameter counts are not, although this is generally not the limiting factor. Training budgets should have been equalized (e.g. pick the nb of training iterations per mo

Reviewer 02Rating 8Confidence 3

Strengths

1. The paper tackles a clear, practical, and important problem. Enabling flexible, elastic compute in parameter-efficient models like looped Transformers is a highly valuable research direction. 2. The paper is well written and easy to follow. 3. The experiments are thorough and convincing. In addition to strong task performance, the authors provide a compelling explanation for why LoopFormer works by analyzing metrics like curvature, anisotropy, and CKA similarity. They demonstrate that baselin

Weaknesses

1. The training procedure (Algorithm 1) requires two forward passes per batch (for the full and short trajectories) to compute the consistency loss. This appears to roughly double the training cost compared to a standard looped model. The paper mentions this as a limitation but does not quantify it. A brief analysis of the training FLOPs/time overhead vs. a Base-Loop or TMLT baseline would be valuable for assessing the practical trade-offs. 2. The paper heavily emphasizes "latent reasoning," usi

Reviewer 03Rating 8Confidence 3

Strengths

* The paper is well written and easy to follow, with clear motivation and setups * The motivation is clearly presented, and the transition from fixed-depth looped Transformers to elastic-depth design feels natural. * Experiments are reasonably comprehensive, evaluating both variable loop lengths and the effect of the proposed shortcut-consistency loss. * The main claims are well supported

Weaknesses

* The degree of novelty is not bad but moderate. While the proposed elastic-depth formulation and shortcut-consistency loss are well designed, they extend existing time-modulated looped Transformer frameworks rather than introducing a fundamentally new paradigm. * the paper does not provide theoretical intuition or analysis explaining why combining t and $\Delta t$ through sinusoidal modulation is a good choice here

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques