On the Expressive Power of Floating-Point Transformers

Sejun Park; Yeachan Park; Geonho Hwang

arXiv:2601.16450·cs.LG·January 26, 2026

On the Expressive Power of Floating-Point Transformers

Sejun Park, Yeachan Park, Geonho Hwang

PDF

Open Access

TL;DR

This paper investigates the expressive power of floating-point transformers, revealing their capabilities and limitations in representing permutation-equivariant and non-equivariant functions under finite-precision arithmetic.

Contribution

It extends theoretical understanding by analyzing how floating-point constraints affect the representability of functions in transformers, including the impact of positional encoding.

Findings

01

Floating-point transformers can represent some non-permutation-equivariant functions without positional encoding.

02

They can represent all permutation-equivariant functions when sequence length is bounded.

03

Large sequence lengths limit the representability of permutation-equivariant functions.

Abstract

The study on the expressive power of transformers shows that transformers are permutation equivariant, and they can approximate all permutation-equivariant continuous functions on a compact domain. However, these results are derived under real parameters and exact operations, while real implementations on computers can only use a finite set of numbers and inexact machine operations with round-off errors. In this work, we investigate the representability of floating-point transformers that use floating-point parameters and floating-point operations. Unlike existing results under exact operations, we first show that floating-point transformers can represent a class of non-permutation-equivariant functions even without positional encoding. Furthermore, we prove that floating-point transformers can represent all permutation-equivariant functions when the sequence length is bounded, but they…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques