Exact Sequence Interpolation with Transformers

Albert Alcalde; Giovanni Fantuzzi; Enrique Zuazua

arXiv:2502.02270·cs.LG·May 14, 2026

Exact Sequence Interpolation with Transformers

Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

PDF

TL;DR

This paper proves that transformers can exactly interpolate finite datasets of sequences in R^d, providing explicit construction, complexity estimates, and extending results from hardmax to softmax attention.

Contribution

The authors present a constructive method showing transformers can exactly interpolate datasets with complexity bounds independent of sequence length, using low-rank attention matrices.

Findings

01

Transformers can exactly interpolate datasets with complexity independent of sequence length.

02

Explicit construction uses low-rank matrices in self-attention, applicable to practical models.

03

Provides convergence guarantees for training transformers to global minima.

Abstract

We prove that transformers can exactly interpolate datasets of finite input sequences in $R^{d}$ , $d \geq 2$ , with corresponding output sequences of smaller or equal length. Specifically, given $N$ sequences of arbitrary but finite lengths in $R^{d}$ and output sequences of lengths $m^{1}, \dots, m^{N} \in N$ , we construct a transformer with $O (\sum_{j = 1}^{N} m^{j})$ blocks and $O (d \sum_{j = 1}^{N} m^{j})$ parameters that exactly interpolates the dataset. Our construction provides complexity estimates that are independent of the input sequence length, by alternating feed-forward and self-attention layers and by capitalizing on the clustering effect inherent to the latter. Our novel constructive method also uses low-rank parameter matrices in the self-attention mechanism, a common feature of practical transformer implementations. These results are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.