AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling

Preslav Aleksandrov; Meghdad Kurmanji; Fernando Garcia Redondo; David O'Shea; William Shen; Alex Iacob; Lorenzo Sani; Xinchi Qiu; Nicola Cancedda; Nicholas D. Lane

arXiv:2507.08567·cs.LG·August 8, 2025

AbbIE: Autoregressive Block-Based Iterative Encoder for Efficient Sequence Modeling

Preslav Aleksandrov, Meghdad Kurmanji, Fernando Garcia Redondo, David O'Shea, William Shen, Alex Iacob, Lorenzo Sani, Xinchi Qiu, Nicola Cancedda, Nicholas D. Lane

PDF

TL;DR

AbbIE introduces a recursive, block-based encoder that improves language modeling efficiency and performance by enabling dynamic compute scaling at test time without specialized training.

Contribution

It presents a novel recursive encoder architecture that generalizes to arbitrary iteration lengths and enhances language modeling with dynamic compute scaling.

Findings

01

Up to 12% improvement in zero-shot in-context learning

02

Up to 5% reduction in language perplexity

03

Effective on models up to 350M parameters

Abstract

We introduce the Autoregressive Block-Based Iterative Encoder (AbbIE), a novel recursive generalization of the encoder-only Transformer architecture, which achieves better perplexity than a standard Transformer and allows for the dynamic scaling of compute resources at test time. This simple, recursive approach is a complement to scaling large language model (LLM) performance through parameter and token counts. AbbIE performs its iterations in latent space, but unlike latent reasoning models, does not require a specialized dataset or training protocol. We show that AbbIE upward generalizes (ability to generalize to arbitrary iteration lengths) at test time by only using 2 iterations during train time, far outperforming alternative iterative methods. AbbIE's ability to scale its computational expenditure based on the complexity of the task gives it an up to \textbf{12\%} improvement in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.