Inner Loop Inference for Pretrained Transformers: Unlocking Latent Capabilities Without Training

Jonathan Lys; Vincent Gripon; Bastien Pasdeloup; Axel Marmoret; Lukas Mauch; Fabien Cardinaux; Ghouthi Boukli Hacene

arXiv:2602.14759·cs.LG·March 3, 2026

Inner Loop Inference for Pretrained Transformers: Unlocking Latent Capabilities Without Training

Jonathan Lys, Vincent Gripon, Bastien Pasdeloup, Axel Marmoret, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene

PDF

Open Access

TL;DR

This paper introduces a simple test-time looping method for pretrained Transformers that enhances their latent representations and improves accuracy without additional training.

Contribution

It proposes inference-time inner looping to extend refinement in pretrained models, revealing latent space improvements through repeated block reapplication.

Findings

01

Modest but consistent accuracy improvements across benchmarks

02

More stable latent state evolution observed

03

Continued semantic refinement through looping

Abstract

Deep Learning architectures, and in particular Transformers, are conventionally viewed as a composition of layers. These layers are actually often obtained as the sum of two contributions: a residual path that copies the input and the output of a Transformer block. As a consequence, the inner representations (i.e. the input of these blocks) can be interpreted as iterative refinement of a propagated latent representation. Under this lens, many works suggest that the inner space is shared across layers, meaning that tokens can be decoded at early stages. Mechanistic interpretability even goes further by conjecturing that some layers act as refinement layers. Following this path, we propose inference-time inner looping, which prolongs refinement in pretrained off-the-shelf language models by repeatedly re-applying a selected block range. Across multiple benchmarks, inner looping yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning