How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Kristian Schwethelm; Daniel Rueckert; Georgios Kaissis

arXiv:2604.21106·cs.LG·May 8, 2026

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

PDF

1 Repo

TL;DR

This paper investigates the value of recurrence in looped transformers by measuring how shared recurrences compare to unique blocks in terms of validation loss and training compute, introducing a diagnostic exponent.

Contribution

It introduces a scaling law and a recurrence-equivalence exponent to quantify the impact of recurrence sharing versus unique blocks in transformer models.

Findings

01

The recurrence-equivalence exponent φ is approximately 0.46, indicating partial equivalence between shared recurrences and unique blocks.

02

Replacing unique blocks with shared recurrences increases validation loss at the same training compute.

03

Truncated backpropagation lowers φ, showing poorer training of the loop mechanism, while hyperconnections raise φ, indicating capacity gains.

Abstract

We measure how much one recurrence is worth to a looped (depth-recurrent) transformer, in equivalent unique parameters. From an iso-depth pretraining sweep across recurrence counts $r \in {1, 2, 4, 8}$ spanning $\sim 50 \times$ in training compute, we fit a joint scaling law $L = E + A (N_{once} + r^{φ} N_{rec})^{- α} + B D^{- β}$ and measure a recurrence-equivalence exponent $φ = 0.46$ . Intuitively, $φ$ tells us whether looping a block $r$ times is equivalent in validation loss to $r$ unique blocks of a non-looped model (full equivalence, $φ = 1$ ) or to a single block run repeatedly with no capacity gain ( $φ = 0$ ). Our $φ = 0.46$ sits in between, so replacing unique blocks with shared recurrences increases validation loss at matched training compute. For example, at $r = 4$ a 410M looped model performs on par with a 580M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kschwethelm/looped-lm-scaling
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.