On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Kevin Xu; Issei Sato

arXiv:2410.01405·cs.LG·June 6, 2025

On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding

Kevin Xu, Issei Sato

PDF

Open Access

TL;DR

This paper analyzes the expressive power of Looped Transformers, establishing their approximation capabilities and proposing enhancements via timestep encoding, supported by theoretical analysis and experimental validation.

Contribution

It provides the first theoretical approximation rate for Looped Transformers and introduces timestep encoding to improve their expressive power.

Findings

01

Increasing loops improves performance.

02

Timestep encoding further enhances capabilities.

03

Theoretical analysis aligns with experimental results.

Abstract

Looped Transformers provide advantages in parameter efficiency, computational capabilities, and generalization for reasoning tasks. However, their expressive power regarding function approximation remains underexplored. In this paper, we establish the approximation rate of Looped Transformers by defining the modulus of continuity for sequence-to-sequence functions. This reveals a limitation specific to the looped architecture. That is, the analysis prompts the incorporation of scaling parameters for each loop, conditioned on timestep encoding. Experiments validate the theoretical results, showing that increasing the number of loops enhances performance, with further gains achieved through the timestep encoding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing · Neural Networks and Applications · Advanced Memory and Neural Computing